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The following discussion presents a summary of the method and 
results of an investigation made by the author in the summer and fall 
of 1930 for the Faculty of Yale College. Two degrees were being 
offered by the College, the Bachelor of Arts and the Bachelor of Phi- 
losophy. The differential training received in fulfilling the require- 
ments for the A.B. degree consisted, in principle, in a greater amount 
of training in Latin (and Greek) than was required of candidates for 
the Ph.B. degree; if each full year of work in a given course in prepara- 
tory school is counted as six hours, in the terminology of credit-counting 
then in use in Yale College, the criterion of permissive candidacy for 
the A.B. degree may be said to have been approximately the satis- 
factory completion of thirty hours of work in Latin (and Greek). As 
part of a general consideration of the advisability of continuing to offer 
the two degrees, differentiated in this manner, the investigation here 
summarized was directed to the single question of whether or not the 
men who had had at least the minimum of thirty hours of work in 
Latin (and Greek) tended on the whole, because of that fact, to do 
better academic work, as measured in terms of grades received, than 
did the men who had had less than thirty hours of work in Latin (and 
Greek). (From this point, in the interest of brevity, Greek will not 
be separately mentioned, but the term “‘Latin” will be used with 
the understanding that it includes both Latin and Greek.) There 
are many considerations, of course, involved in the determination 
of the value of Latin as part of the so-called liberal education; this 
investigation is concerned only with the specific consideration of the 
value of a Latin-discipline as a contributor to the quality of a student’s 


work in other academic subjects. 
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Grades received in work completed in Latin were not considered 
in connection with the grades received in other fields of work, but 
merely the number of hours completed in Latin together with the 7 
grades in the other fields of work. The factor of relative ability in 
Latin could not be taken as of comparative significance for the investi- 
gation. There would tend to be a direct correlation between grades in 
Latin and grades in other fields arising merely from the transfer of 
general ability from field to field. An attempt might have been made 
to allow for the factor of transferred general ability by using the method 
of partial correlation, but when the available data had been broken 
down into the sub-groups necessary for homogeneity in respect of 
several qualitative factors, the data falling in each such sub-group 
would not have been sufficiently numerous to permit a trustworthy 
application of that method. With hours of Latin alone as the basic 
criterion of classification, the comparisons between the quality of work 
in other fields shown by men with more Latin and the quality of such 
work shown by men with less Latin tend to be biased to some extent 
in the direction of a more favorable showing on the part of the men 
with the greater number of hours of Latin. This favoring of the ‘more 
Latin” group in the comparisons follows from the fact that it would 
probably be the men of sufficient general ability to have maintained 
relatively high grades in Latin who would tend most strongly to con- 
tinue with extra hours of Latin. This indirect inclusion of the factor of 
relative levels of grades received in Latin must be kept in mind in inter- 
preting the results of the comparisons made. 

The classification on the basis of hours of Latin was adopted for 
the investigation instead of a classification on the basis of the degree 
taken in order to avoid the difficulty of individual aberrations from 
the general relationship between the degree taken and the amount of 
work done in Latin, and also to minimize the effect of any element of 
selectivity which might exist in the choice of the degree to be taken. 
Under the many influences affecting the choice of degree there might 
exist a tendency for the men offering themselves as candidates for one 
of the degrees to be an inherently more able group than the men offer- 
ing themselves as candidates for the other degree, regardless of any 
differential training received or not received in completing the require- 
ments for those degrees. With men classified on the basis of hours of 
Latin completed at successive stages in the academic career, so that an 
observation could be taken from a man’s record at any such stage 
without reference to whether or not certain requirements for a particu- 
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lar degree had been satisfied at that particular time, there was made 
possible a finer grouping of the men according to the relative amounts 
of differential training received up to the time of doing a certain grade 
of work in other fields. By comparing the average quality of work as 
between such more narrowly defined groups, the influence of any 
element of selectivity in the degrees would be almost certainly elimi- 
nated. The comparisons made in the investigation were ordinarily 
between the quality of work done by men who, at the time of doing 
such work, had completed something less than thirty hours of Latin and 
the quality of work done by men who had completed thirty hours or 
more of Latin, since that division by hours approximated the line 
of demarcation established in the respective requirements for the two 
degrees. 

Besides the amount of training in Latin a man has received, there 
are a number of other factors which are possibly or certainly possessed 
of some influence in helping to determine the quality of work a man 
does in other fields, as measured in terms of grades received. Before 
significant comparisons can be made between the respective qualities 
of work done by men with more and less Latin, the groups established 
on the basis of Latin-hours must have been made essentially homogene- 
ous in respect to these other factors. The most important of such 
factors would be the following: 1, The man’s age; 2, Extra-curricular 
disturbances, whether academic, social, or personal; 3, Extra hours of 
work carried; 4, Personal ability or inherent intellectual capacity; 
5, Interest in particular subject of work; 6, Character of training in 
preparatory school; 7, Extent of experience with the academic routine; 
8, Type of subjects studied; 9, Grading standards in the subjects 
studied. For the purpose of the investigation these nine factors were 
handled in varying ways. 

The factors of age and extra-curricular disturbances were allowed 
for in part by exclusions from the field of observations. To make all 
the data used essentially homogeneous in respect of the student’s age 
no man’s record was considered if that man’s date of birth was more 
than twenty-four years or less than twenty years prior to the date of 
his graduation from Yale College, nor was it considered if his date of 
birth was not available in the records. A further allowance for age 
was involved in the treatment of the factor of extent of experience with 
the academic routine. The extreme effects of extra-curricular dis- 
turbances were similarly allowed for by excluding from consideration 
the record of any man who did not complete his work for one of the 
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degrees in the ordinary course of three years following Freshman Year; 
the great diversity of factors leading to interruptions of the normal 
course of academic progress would be expected to have varying effects 
upon a man’s work in different fields of study and might have affected 
considerably some of the groups to be compared; the records eliminated 
for this reason were a small proportion of the total. The factors of 
extra hours of work and of personal ability or inherent intellectual 
capacity were considered as being sufficiently uniform over the closely 
defined pairs of “‘hours of Latin” groups used for the essential com- 
parisons to render averages taken over those “hours of Latin” groups 
practically free of any net contribution from either of these factors. 
The process of averaging by “‘hours of Latin” groups was also relied 
upon to eliminate any disproportionate influences of the factors of age 
and extracurricular disturbances not directly excluded; the exclusion 
of records on the basis of these two factors, moreover, helped to remove 
the more heterogeneous extremes of the influence of the factor of 
personal ability. The factor of interest in the particular subject 
of study was treated in part of the investigation by taking the record of 
each man’s work only in the field of his Concentration or Major study; 
very few groups could be established on this basis, however, because of 
scarcity of numbers, most such groups being in the field of English 
and the remainder in History and Economics; for the greater part of 
the investigation reliance had to be placed upon the uniformity of this 
factor over the particular “hours of Latin” groups directly compared 
and the consequent sufficiency of the process of averaging to render the 
groups comparable in respect of this factor. 

Special groupings of the data were made to care for the influences 
of the remaining four factors. The most desirable treatment of the 
factor of character of training in preparatory school would have been 
to group the men according to each specific preparatory school 
attended; there were not sufficient data to permit this refined subdivi- 
sion, but substantial homogeneity in respect of this factor was probably 
obtained by recognizing the major difference in type of preparation 
and grouping the men according to whether they attended private 
school, public school, or a mixture of the two; men who transferred 
to Yale College from another college were entirely eliminated from 
consideration. ‘The factor of extent of experience with the academic 
routine, together with its corollary of the extent to which a man may 
have established a background of general knowledge useful in the 
pursuit of numerous specific subjects, was allowed for by taking in 
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separate groups the record of each man’s work in his Sophomore, 
his Junior, and his Senior years respectively. Since the influence of a 
differential training in Latin might obviously be expected to be much 
greater in some fields of academic study than in others, it was necessary 
to group the observations on grades according to the respective fields 
of instruction in which those grades were received. It was necessary 
to go farther. The factor of grading standards in the specific subjects 
taken by a man has a large influence upon the quality of that man’s 
work as indicated by the grades he has received; there were evident 
such considerable variations among the grading standards of different 
courses that comparisons of men on the basis of their respective average 
grades in all courses taken might be said fairly to show not so much 
differences in the intellectual ability of the men as differences in their 
selection of courses; it was necessary, therefore, in order to have a 
reasonable homogeneity in the comparison-groups in respect of this 
factor to group the observations on grades according to the specific 
courses in which those grades were given. 

The general field of observations for the investigation was given 
by the Classes of 1926 to 1930 inclusive, for each of the three years of 
work in Yale College (the Freshman Year at Yale being under a sepa- 
rate administration), omitting any man who did not take a degree at 
the end of his third year in College, any man who transferred to Yale 
from another college, any man whose date of birth was more than 
twenty-four years or less than twenty years prior to the date of his 
graduation from Yale College, and any man whose date of birth was 
not available in the records. The investigation covered the records of 
five Classes over seven years of academic work in all. The available 
records of those few Classes prior to the Class of 1926 which had passed 
through Yale under the system of the two degrees were omitted from 
consideration so as to allow ample time for the full working out of any 
processes of readjustment in methods of preparation for college and in 
the reactions of undergraduates in such matters as the election of 
courses, which might have been initiated by the establishment of the 
two-degree system; the Classes of 1926 to 1930 were presumably homo- 
geneous in this respect. Observations from all these Classes were 
taken in single combination, without segregation by the individual 
Classes, as a previous investigation on a different subject had indicated 
no tendency for successive Classes over the period here involved to be 
any better or any worse than the others in average quality of academic 
work, 
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The total field of observations was first divided in three parts 
according to the type of preparatory school attended by the men, 
i.e. Private (including tutoring schools), Public, and Mixed (where a 
man attended both private and public school). In each of these parts 
observations were taken separately for work in Sophomore, Junior, 
and Senior years respectively. For each man there was recorded the 
cumulative total hours of Latin completed at the end of his Freshman 
year (including such hours completed in preparatory school), at the 
end of his Sophomore year, and at the end of his Junior year. Grades 
received by a man in a particular year in specific courses in other 
fields of work were recorded in conjunction with the hours of Latin 
completed by the man by the end of the year preceding. Grades used 
were term-grades, and in two-term courses the averages of the two 
term-grades. Grades were taken in all courses (except in subjects 
definitely unrelated to the general academic curriculum, such as Mili- 
tary and Naval Science and Applied Physiology) in which total enroll- 
ment for the Classes of 1926 to 1930 seemed sufficiently large to make 
possible significant analysis. No course was considered, however, in 
which there had been during the period involved a change of instruc- 
tors or a shift in teaching-direction which might make probable an 
alteration of grading standards in that course. Changes in the cata- 
logue-numbers of courses, replacements of one course by another under 
the same catalogue-number, and other such mechanical variations 
in the records were checked and adjusted. ; 

Grades were originally taken in twenty-eight separate courses 
lying in fifteen different fields of instruction; insufficient observations 
were afforded by five of these courses; the records from the remaining 
twenty-three courses in thirteen different fields of instruction yielded 
seventy-one “‘ Preparatory School and College Year” sub-groups with a 
sufficient number of observations for significant analysis. For each 
of these seventy-one sub-groups the recorded grades were divided in 
two, three, or four “Hours of Latin” groups, depending upon the 
possibility of significant differentiation in each case, with at least a 
division, except in four cases where it was impossible, between ‘“‘less 
than thirty hours” and ‘thirty hours or more.” The total of the 
“Hours of Latin” groups for the twenty-three courses, under the 
various ‘‘Preparatory School and College Year” groupings, was 
two hundred forty-three. There was also recorded for eight fields 
of Majors the average grade received each year by each man in all 
the courses he had taken in the particular year in his field of Major; 
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only three fields of Major proved to contain a sufficient number of 
observations; of the twenty-seven “Preparatory School and College 
Year” groups in which the records for these fields were divided fifteen 
included enough observations to be usable; these fifteen groups were 
subdivided in fifty-seven ‘Hours of Latin” groups. The primary 
material for the analysis consisted, therefore, in the distributions of 
grades in three hundred “Hours of Latin” groups jointly classified 
according to type of preparatory school, according to year in college, 
and according to specific course of instruction or field of Major. The 
following examples are given to illustrate the nature of this classifi- 
cation of the records. 


CLASssEs OF 1926 To 1930 








‘ Year of Prepara- Hours of | Number | Median 
Subject work wey Latin of men grade 
school 

History K.......... Junior Private 0-12 26 71.8 
18-24 21 75.0 

30 76 71.9 

33-54 35 74.7 

PE Gieesscccnent Senior Mixed 0-12 26 73.5 
18-24 16 74.2 

30 32 77.5 

36-54 17 72.9 

) Senior Public 0 17 78.6 
12-24 45 75.5 

30 50 79.5 

33-54 31 79.3 




















The first step in the analysis after this process of classification 
was to secure a measure of the a. crage level of grades in each of the 
three hundred ‘‘ Hours of Latin” groups. Medians were used as the 
most suitable averages for these distributions, as in a number of 
instances arithmetic means would have been greatly affected by a 
few extremely irregular items and consequently would not have been so 
representative of the typical central tendencies of the respective 
groups. The Medians of the grades for the various ‘‘ Hours of Latin” 
groups, within each ‘‘ Preparatory School and College Year” group 
under each course of instruction, were then examined comparatively. 
The chief dividing line for the inter-group comparisons was taken as 
between the groups of less than thirty hours of Latin and the groups 
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of thirty hours or more of Latin, since the chief point of the investi- 
gation was to discover whether or not the men who had had at least a 
minimum of thirty hours of work in Latin tended, because of that fact, 
to do better academic work, as measured in terms of grades received, 
than did the men who had had less than thirty hours in Latin. 

In the case of thirty of the seventy-one “‘ Preparatory School and 
College Year” groups under the twenty-three separate courses involved 
there was at least one “Hours of Latin’ group lying below the level 
of thirty hours which had in each instance a Median larger than or as 
large as the Median of each of the “‘ Hours of Latin” groups covering 
thirty hours or more. In the “Private School-Junior Year” group, 
for example, in the course which we may call History K the Medians 
of the “Hours of Latin” groups were as follows: 








Hours of Latin Number of men Median grade 
0-12 26 71.8 
18-24 21 75.0 
30 76 71.9 
33-54 35 74.7 











the Median of the eighteen to twenty-four hours group being larger 
than the Median of each of the groups covering thirty hours or more. 
These cases occurred in each of thirteen fields of instruction covered 
with the exception of Spanish and Philosophy, but tended to occur 
with less relative frequency in English and French than in the other 
fields; they occurred in the Private, Public, and Mixed School groups, 
tending to occur with relatively great frequency in the Public School 
groups; they occurred in the Sophomore, Junior, and Senior Year 
groups, but the tendency to relatively greater frequency of occurrence 
in the Sophomore Year groups was merely a result of the fact that there 
were proportionately more Sophomore Year groups from the fields 
of instruction where the frequency of occurrence of these cases tended 
to be relatively great. These thirty cases gave evidence, then, 
that in the particular ‘‘ Preparatory School and College Year” groups 
in the particular courses concerned there was no tendency for a 
previous differential training in Latin to be of assistance in the work 
of these particular courses. The opposite question of whether or 
not the previous differential training in Latin was an actual hindrance 
in the work of these particular courses was not investigated, but 
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these thirty ‘‘Preparatory School and College Year” groups were 
simply dropped from further consideration. 

Remaining under the twenty-three separate courses there were 
forty-one “Preparatory School and College Year” groups in each of 
which there was at least one “‘ Hours of Latin” group covering thirty 
hours or more with a Median larger than the Median of each of the 
“Hours of Latin” groups lying below the level of thirty hours. Take, 
for example, the “ Private-Sophomore”’ group of French G: 








Hours of Latin Number of men Median grade 
0-12 27 65.0 
18-24 47 68.1 
30 65 69.4 
36-48 42 71.4 











the Median of the thirty-six to forty-eight hours group (or in this 
case the Median of the thirty hours group also) being larger than 
the Median of each group lying below the level of thirty hours. Of 
the thirteen fields of instruction covered by these courses, Mathematics 
and Biology were the only ones in which such a case did not occur, but 
the records available from the field of Mathematics were very scant. 
These forty-one cases gave, then, a preliminary indication that in 
these particular ‘‘Preparatory School and College Year” groups in 
the particular courses concerned there did exist a tendency for a 
previous differential training in Latin to be of apparent assistance in 
the work of these particular courses. The strength of this indication 
was somewhat qualified at the outset, since in fourteen of these forty- 
one cases there was one “‘ Hours of Latin” group covering thirty hours 
or more which had a Median smaller than, or no larger than the 
Median of one of the ‘‘ Hours of Latin” groups lying below the level 
of thirty hours; these fourteen cases, however, were carried along 
with the others for further analysis. An example of such a case is 
given by the “‘ Private-Junior” group of English H: 








Hours of Latin Number of men Median grade 
0-12 32 74.7 
18-24 26 75.0 
30 76 73.9 
33-54 61 75.8 
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the Median of the thirty-three to fifty-four hours group being larger 
than the Median of each group below the level of thirty hours, but the 
Median of the thirty hours group being at the same time smaller than 
the Median of one of the groups (in this case smaller than the Median 
of both groups) below the level of thirty hours. 

Of the fifteen ‘‘ Preparatory School and College Year’”’ groups under 
the three fields of Majors (nine groups in English, three in History, 
and three in Economics) there were two cases in which there was at 
least one ‘‘ Hours of Latin”’ group lying below the level of thirty hours 
which had a Median larger than the Median of each of the “‘ Hours of 
Latin” groups covering thirty hours or more. One of these cases was 
“‘History-Private-Sophomore” and the other was ‘ Economics- 
Private-Sophomore.” These two cases ran as follows: 





Hours of Latin Number of men Median grade 





(History Majors-Private-Sophomore) 





0-12 35 75.0 
18-24 30 78.5 
30 50 77.5 
36-54 16 71.5 











(Economics Majors-Private-Sophomore) 





0-12 22 71.8 
18-24 18 74.2 
30 27 71.2 





The other thirteen cases, each having at least one ‘‘ Hours of Latin”’ 
group covering thirty hours or more in which the Median was larger 
than the Median of each of the ‘‘ Hours of Latin”’ groups lying below 
the level of thirty hours, were held for further analysis. An example 
of these thirteen cases is given by the ‘‘Public-Junior” group under 
English Majors: 3 








Hours of Latin Number of men Median grade 
0 40 80.0 
12-24 60 80.8 
30 76 82.2 
36-54 26 82.5 
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There were, therefore, fifty-four ‘“‘ Preparatory School and College 
Year”’ groups, forty-one of them relating to work in twenty separate 
courses in eleven different fields of instruction and thirteen of them 
relating to general work in three fields of Majors, which required 
further consideration. The question remaining to be answered in 
connection with each of these fifty-four cases was whether or not the 
average grade in the “‘ Hours of Latin” group covering thirty hours or 
more was sufficiently larger than the average grade in the “‘ Hours of 
Latin’’ group lying below the level of thirty hours to be significant of a 
real difference between the two “Hours of Latin’’ groups, significant 
of an effective influence from differential training in Latin, rather 
than merely indicative of a fluctuation of sampling, indicative, that is, 
of the compounded effect of the operations of all other individually 
unimportant factors which might be summarized under the name of 
chance. This remaining question had in reality a double aspect: 
First, whether the observed positive differences in average grades for 
the specific groups under the Classes of 1926 to 1930 were significant 
of a true difference due to differential training in Latin or were indica- 
tive merely, as a matter of probability, of the uncancelled influences 
of other factors which were not completely eliminated by the processes 
of omission, grouping, and averaging; second, whether such positive 
differences between the work of men with a Latin background of 
thirty hours or more and the work of men with a Latin background of 
less than thirty hours would tend to disappear or would have a greater 
likelihood of persisting over similarly classified groups taken from 
other Classes which had already entered or were yet to enter Yale 
College, granting the assumption that no considerable changes should 
occur in general educational conditions in preparatory schools and 
in the College or in general personal conditions among the men in 
respect of private, academic, and social life, as those conditions had 
existed during the years in which the Classes of 1926 to 1930 were 
preparing for and passing through the College. 

In this final examination of the relations between the Medians from 
the “‘Hours of Latin’’ groups in the forty-one ‘Preparatory School 
and College Year”’ groups relating to work in twenty separate courses 
and in the thirteen ‘‘Preparatory School and College Year” groups 
relating to general work in three fields of Majors, in each case that 
“Hours of Latin” group covering thirty hours or more which had the 
largest Median was compared with that “ Hours of Latin’ group lying 
below the level of thirty hours which had the smallest Median. In 
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the case of ‘“‘Geology R-Private-Junior,’’ for example, the 0 hours 
group had a Median nearly as large as the Median of the thirty hours 
group, but the testing comparison for this case was made between the 
Median of the twelve to twenty-four hours group, which was smaller 
than the Median of the 0 hours group, and the Median of the thirty- 


three to sixty hours group, which was larger than the Median of the 
thirty hours group: 








Hours of Latin Number of men Median grade 
0 19 78.4 
12-24 107 77.6 
30 112 78.8 
33-60 ae 80.7 











This method of selecting the pairs of ‘“‘Hours of Latin” groups for 
specific comparison tended, of course, to give the most favorable 
showing possible to any influence present from the differential training 
in Latin. The significance of the observed positive differences in 
Medians was tested in each case by the relation of the difference to 
the Standard Error of the difference. In handling the Standard 
Errors of the Differences in Medians allowance was made where 
necessary, on the basis of the theory of small samples, for the limited 
number of observations included in some of the ‘‘ Hours of Latin” 
groups which were compared. The lowest degree of probability which 
might be taken safely as the point of demarcation between a significant 
value produced by differential influences and a chance value produced 
by fluctuations of sampling, granted that samples are not biased, may 
be stated as approximately ten in one thousand; in the case of a sample 
of at least thirty observations the probability would be approximately 
ten in one thousand that a given difference between measures was 
only a “‘chance-difference”’ if the difference were 2.50 times its Stand- 
ard Error; if the difference were 3.00 times its Standard Error, the 
probability that it was merely a ‘‘chance-difference’”’ would be only 
one in one thousand, that is, it would be a practical certainty that 
there did exist a ‘‘true-difference”’ exceeding zero. As the difference 
is a large multiple of its Standard Error, the probability that the 
difference is due to chance decreases, and the probability that the 
difference is of positive significance increases. There follow two 
illustrations of this testing of the significance of a difference between 
the Medians: 
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Standard 
“ Hours of i Standard | Standard | Difference 
Latin” groups pied ae deviation | error of | of medians Pm of D/8E 
compared of grades | median (D) ¢ =) 
(Geology R-Private-Junior) 
12-24 107 77.6 5.4 0.7 
33-60 77 80.7 4.9 0.7 } 3.1 1.0 3.10 


























With D/SE equal to 3.10, the probability is only about one in one thousand that the difference 
might arise purely by chance; there is probably a “ true-difference” greater than sero. 








(English T-Private-Sophomore) 
0 67 78.0 7.7 1.2 
30 112 79.4 7.9 0.9 } 1.4 1.5 0.93 


























With D/SE equal to 0.93, the probability is over one hundred in one thousand that the differ- 
ence might be due entirely to the operation of chance; there probably is not a significant positive 
difference. 


Among the forty-one comparisons for the groups relating to work 
in twenty separate courses, in twenty-eight cases the relation between 
the difference in Medians and the Standard Error of the difference 
was such that the chances that the difference was due merely to fluctua- 
tions of sampling were over twenty in one thousand; in eight additional 
cases the chances were between twenty in one thousand and eleven in 
one thousand; five cases remained in which the chances were ten in 
one thousand or less, that is, in which there probably. was a positive 
significance to be attached to the difference in Medians. These five 
cases were those of the ‘‘Private-SSophomore” group in a course in 
English, the ‘‘Private-Sophomore”’ group in a course in French, the 
“‘Private-Junior” and the ‘‘Private-Senior’”’ groups in a course in 
History, and the “Private-Junior”’ group in a course in Geology. It 
should be observed that all of these cases applied to groups of men who 
prepared for college at private schools. 

The course in English showing a significant difference in Medians 
is a course in composition and literature, covering such authors as 
Chaucer, Malory, Spenser, Bacon, Milton, Pope, and Byron; this 
literature is especially rich in classical allusions and with allowance 
for a further element in the course of some attention to linguistic 
roots, there is sufficient reason for expecting a priori that an adequate 
previous training in Latin would be of assistance in the work of the 
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course. The course in History in which the significant differences were 
found is concerned with the Middle Ages, a field in which a previous 
training in Latin might obviously be expected to be of advantage 
because of an overlapping in subject-matter; the other available groups 
for Private School men relating to work in History were in the courses 
on the history of the United States and the history of modern Europe 
and no advantage from a differential training in Latin was evident in 
these cases. The course in French in which a significant difference in 
grades was found is a course devoted to conversation and composition; 
the only other available group for Private School men relating to work 
in French was in a course in French literature, where no significant 
difference in grades was found; it seemed to be suggested, therefore, 
but not clearly established because of the limited amount of evidence, 
that a certain type of differential training in Latin, undoubtedly on 
the basis of linguistic similarity, might be of assistance in the study 
of French composition but not in the study of French literature. The 
course in Geology in which a significant difference in grades appeared 
is a course in organic evolution, the treatment being such that some 
advantage carried over from a differential training in Latin might 
possibly, but not indubitably, be expected; there was no evidence of 
such advantage for the work in this course in the case of men who 
prepared at private schools but took the course in Senior year instead 
of in Junior year; the better judgment on this case might be that it 
represented that odd chance of occurrence which is quite possible in 
spite of the relatively high degree of probability against it. In general, 
the advantage of a differential training in Latin for work in other sub- 
jects was localized to specific instances where a direct interrelation of 
subject-matters was evident. 

Among the thirteen comparisons for the groups relating to general 
work in three fields of Majors, in seven cases the relation between 
the difference in Medians and the Standard Error of the difference 
was such that the chances that the difference was due merely to fluc- 
tuations of sampling were over twenty in one thousand; in two addi- 
tional cases the chances were approximately twenty in one thousand; 
four cases remained in which the chances were ten in one thousand or 
less. These cases with significant differences in Medians were all in 
the field of English. Of the four cases three applied to men who 
prepared for college at private schools, relating to their work in Sopho- 
more, Junior, and Senior years respectively; the fourth case applied to 
men who prepared at public schools but related to their work in English 
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in the Senior year alone, the cases relating to their work in Sophomore 
and Junior years not showing any significant differences in Medians. 
The Public School men, majoring in English, who had the greater Latin 
training tended to have obtained that training in college rather than 
in preparatory school to a much greater extent than the Private School 
men, majoring in English, who had the greater Latin training. In the 
‘‘English Majors-Private”’ group for general work in English in Sopho- 
more year sixty-three per cent of the men already had thirty hours or 
more of Latin at the end of Freshman year, while in the “‘ English 
Majors-Private” group for work in Senior year sixty-nine per cent of 
the men were found to have had thirty hours or more of Latin by the 
end of Junior year, giving a rate of increase in college in the acquisition 
of thirty hours or more of Latin of 9.5 per cent; in the ‘‘ English Majors- 
Public” group for general work in English in Sophomore year thirty- 
three per cent of the men already had thirty hours or more of Latin 
at the end of Freshman year, while in the “‘ English Majors-Public”’ 
group for work in Senior year fifty-one per cent of the men were found 
to have had thirty hours or more of Latin by the end of Junior year, 
giving a rate of increase in college in the acquisition of thirty hours or 
more of Latin of fifty-five per cent. With the appearance of a signi- 
ficant difference in Medians for the general work in English only in 
Senior year in the case of men who prepared at public school and 
majored in English, while such significant differences were found for 
the work in all three years in college in the case of men who prepared 
at private school and majored in English, in conjunction with this 
other fact that the Public School men, majoring in English, who by 
Senior year had a Latin background of at least thirty hours had 
obtained a relatively large proportion of that background after reaching 
college instead of in the preparatory schools, the implication was clear 
that a differential training in Latin obtained in public schools was not 
the equivalent of such a training obtained in private schools as a possi- 
ble source of advantage in the pursuit of studies in other academic 
fields. This same implication had previously been raised by the fact 
that in the comparisons over the groups relating to work in individual 
courses all the cases which showed significant differences in Medians 
applied to groups of men who had prepared at private schools. 

The groups covered under the field of English Majors were some- 
what differently constituted, of course, from the groups covered under 
various individual courses in English; the latter included the grades in 
those specific courses for all men taking those courses, while the former 
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included all grades for work in any course in English but only for those 
men who were majoring in English. The groups covered under the 
field of English Majors were presumably more homogeneous in respect 
of the factor of degree of interest in the subject of English than were 
the groups covered under the various separate courses in English. 
This presumption furnished the reason for taking up the Majors groups 
for separate analysis. The results, stated above, of the comparisons 
over the groups classified by fields of Majors showed that for men with 
a special interest in the subject of English the work done in English 
would be of better quality on the average in the case of men with a 
larger rather than a smaller background of Latin, provided that Latin 
background was not of the public school variety. Nothing similar 
was shown for men with presumably special interest in the subjects of 
History or Economics, as these fields of Majors did not produce any 
groups with significant positive differences in Median grades associated 
with a more extensive background in Latin. There may have been 
other fields of special interest, such as the Romance Languages, in 
which the same situation as in English would have been found to exist 
if there had been sufficient data to permit the analysis; this possi- 
bility was not of particular importance for the general conclusions of 
the investigation, however, since the three fields of English Majors, 
History Majors, and Economics Majors included the great bulk of all 
students passing through Yale College. 

In summary, the investigation was based on the records of the men 
in five college Classes over seven years of academic work; various fac- 
tors of bias were treated by the processes of limited omission and of 
grouping; of all the grades received by the men of these five Classes 
throughout their careers in college the observations utilized by the 
investigation represented a sample of about one in four; out of twenty- 
two fields of instruction open to men in the College, aside from the 
directly eliminated fields of the Classics, Applied Physiology, and 
Military and Naval Science, thirteen fields afforded data for the 
investigation; the nine fields which did not offer sufficient data for the 
analysis were Astronomy, Botany, Chemistry, Fine Arts, Germanic 
Languages, Music, Semitic Languages, Political Science, and Italian; 
the impossibility of applying the analysis, because of scarcity of data, 
to the records given by courses with relatively small enrollments was 
the only possible source of lack of representativeness in the records of 
the investigation, but this possibility was not considered to be of 
sufficient importance to vitiate the findings of the investigation; these 
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findings, in as free generalization as seemed legitimate, were that a 
differential training in Latin, provided the differential training was 
acquired in private preparatory schools or after entering Yale College, 
would help a man to do work of better quality, as measured in terms of 
grades received, in certain specialized courses, such as French composi- 
tion and the history of the Middle Ages, wherever the subject-matter 
of such courses was directly tied to the knowledge gained in a study of 
Latin, and would help a man to do work of better quality in a relatively 
intensive study of the field of English, with the possibility that it 
might help a man to do work of better quality in a relatively intensive 
study of some few other fields where the background of knowledge 
gained in the study of Latin could serve rather directly as a source of 
information on the subject-matter of those fields. There was no 
evidence at all of any value in a more extended study of Latin as an 
intellectual discipline, serving to extend the scope of intellectual 
capacity in whatever field it might be applied. These conclusions 
should be understood, of course, in terms of ‘“‘Latin as taught”’ and 
“other subjects as taught,’’ with the possibility remaining that some 
change in the type or method of instruction in Latin or some change in 
the manner of presentation of other subjects might affect the relation- 
ship between the extent of a man’s background in Latin and the quality 
of his work in other academic fields. 
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ANALYSIS OF A COMPLEX OF STATISTICAL 
VARIABLES INTO PRINCIPAL COMPONENTS 


HAROLD HOTELLING 
Columbia University 


(Continued from September issue.) 


8. DETERMINATION OF PRINCIPAL COMPONENTS FOR INDIVIDUALS 


To determine from his test scores the value of a principal com- 
ponent y for an individual, the formula 


y= (33) 


(summation over all the n tests indicated by the repetition of J 
of the last section) would be appropriate if there were no error of 
measurement, so that the test scores could be taken as identical 
with the true scores z;. It is however evident that if the reliabilities 
of the tests vary widely, the weights of the more reliable tests should 
be increased relatively to the others. The ordinary rule for com- 
bining independent observations is that the weights should be inversely 
proportional to the variances of the chance errors;! this rule must 
however be modified in the present case, since the tests which con- 
stitute the observations are not independent; and indeed, such a 
weighted mean would be the same for all the y’s, and would have 
nothing to do with the a’s. 

We shall estimate the value of y for an individual as the linear 


function 7’ of his test scores such that the mean value in the population 
of 


YY -a2Z 
shall be a minimum. This criterion gives the same results as to 
require the correlation of y’ and y to be a maximum, except that the 
coefficients in the linear function 7’ may be multiplied by a constant 
which is determined by the former but not by the latter condition. 


If the analysis has been performed upon a matrix of raw cor- 
relations, with reliability coefficients in the principal diagonal, the 





1 This is the system of weights which makes the variance of a weighted mean 
& minimum. It may also be deduced from the work by Truman L. Kelley on 


pp. 212-213 of Interpretation of Educational Measurements, World Book Company, 
1927. 
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simple formula (33) provides immediately the solution of the problem. 
In what follows, we suppose that the analysis has been performed 
upon the matrix of correlations corrected for attenuation, as in the 
example of Section 5. 

The variances of the true scores z; have been taken as unity and 
those of their chance errors ¢; as o;7; consequently the variance of 
the observed score z; = z; + «; is 1 + ¢,? in these units. But now 
let us denote by y; the observed score expressed in standard measure, 
i.e. put \ 

— "Ste _ (es: + &) fr, (34) 
+ Ji +o? : 5 T7 
The difference between the sample mean and that of the population 
we ignore, as being a small quantity of higher order than we are 
considering, and take both these means to be zero. For the products 
with which we shall deal we have the following population means, 
or expectations: 





Eyy; = vii; (35) 


the observed correlation between the scores, equal to unity if 1 = j, 
and otherwise given by (32). Also, from (34), since Ezz; = rij, 
we have 


Eyz; = Vritii; (36) 


where the summation convention does not apply, though J = i, the 
use of capital letters as subscripts serving to distinguish such cases; 
but in formulae such as those below, summation with respect to j 
is to be understood, as this lower-case letter occurs twice; in each term 
of the sum, J is to have the same value as j. 

From (33) and (36), 


T 75,0; 
Eyvy - ; m. 





Now by (16), 
430; = ka,; 


and putting 

b; - Vr ai, 
we obtain 

Eyvy = 0x. (37) 
Now putting 


y = cy, (38) 
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the quantity to be made a minimum by a proper choice of the ¢; is 


T = E(y' — 7)? 
8 = E(cy: — v) (ci — ¥) 
iy = 1 ¢iCsC; =— 2b.c; + a (39) 


bi a the last expression following from (35), (37), and the fact that 7 
oF is expressed in standard measure, so that Ey? = 1. 

q | Differentiating (39) with respect to %, (Gj=1,---,n), we 
ae obtain the n equations 

1’ sj; = dy. (40) 


j oC By any of the methods used for solving normal equations, (40) may 
oF be solved for the ;. 

| Let R’», be the cofactor of r’» in the determinant of these cor- 
relations, divided by this determinant. Then, just as in (4), 


R’ ar’ i; = 5p; (41) 


Multiplying both sides of (40) by FR’, summing with respect to 1, 
and using (41), we find that the solution of (40) may be written in 
ef the form 

ht Ch = R’ bi. (42) 
eof If all or nearly all the principal components are to be expressed 
om Tee numerically in terms of the test scores, it is best to compute the 
n? quantities R’», and then for each component to use (42). The 
R’» may be found most readily by applying to (41) a method such 
as‘that of Doolittle for solving normal equations. The coefficients 
in these equations are the uncorrected correlations, as in (40); but 
since the right members are replaced in turn by the columns 


0 Gis. 





















and since in the solution all these columns can be carried along simul- 
is taneously, the work is much reduced below that required for a direct 
a solution of (40). This procedure is readily checked throughout 
a by carrying along an additional column, the entry in each row being 
the sum of all the preceding entries in that row. 

| For the four tests we have been using as an illustration, the raw 
1 correlations are: 
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1. .633 241 059 
1. — .055 065 
1. 425 

1. 


The procedure described above gives for the inverse matrix of the 
Ra’: 


1.959 — 1.292 — .646 244 
1.865 529 —.271 

1,441 — .609 

1.261 


These matrices are of course to be thought of as square and sym- 
metrical; a column is read by moving down to the last of the entries 
in that column as written above, and then to the right. With this 
understanding, the columns are to be multiplied by those of the 
matrix of b’s below; this is obtained from that at the end of Section 5 
by multiplying each row by the square root of the reliability coefficient 
(given in Section 7) for the corresponding test. 


Values of bs = ayv/r; 











COMPONENT 
Test z 1 2 3 4 
1 .779 — .425 — .279 — .232 
2 .652 — .589 .271 .217 
3 .582 .637 — .360 .179 
4 .436 .491 .346 — .104 

















Multiplying the two matrices, we have finally for the c’s: 











Values of c; 
CoErFFICIENTS OF Tgst Scores IN EsTIMATES OF PRINCIPAL COMPONENTS 
Component 
Test 2; 1 2 3 4 
1 .414 — .363 — .580 — .876 
2 .399 — .345 .582 .827 
3 .415 .582 — .406 . 586 
4 . 209 . 287 .514 — .356 
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For comparison with these coefficients (which for any particular 
are to be read down a column), we give below the coefficients A; = 
a;/k, obtained simply by dividing the columns of the table at the end 
of Section 5 by their characteristic roots. These are the coefficients 
which would be used with the true scores, if we knew them. 





Values of A; = < 


CoEFFICIENTS OF TRUE ScorRES IN EQUATIONS FOR PRINCIPAL COMPONENTS 


























t Component 
| Test Zs 1 2 3 4 
tL 1 .440 — .303 — .557 —1.467 
2 .373 — .424 .550 1.394 
I 3 .331 456 —.724 1.139 
4 315 447 .883 — .824 
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9. ITERATIVE SOLUTION OF NORMAL EQUATIONS, CONVERGENCE 


Bia If the tests are numerous and only a few of the principal com- 
re ponents are to be found, it may be better to use an iterative method 
oe of solving the normal equations (40). However the advantage is 
toe not as overwhelming as that of the iterative method of Section 4 for 
. | 
| 














the determination of the a;;. The speediest iterative process now 
available seems to be that of Kelley and Salisbury.'! It is a modifi- 
cation, which accelerates convergence, of the following method, which 
Fe is similar to those of Gauss, Jacobi, and Seidel,? differing from that 
| of Seidel only in that the normal equations and the unknowns may 

| 




















here be regarded as transformed so as to make the coefficients corre- 

lations. In demonstrating that this method converges, it will follow 

a fortiori that that of Kelley and Salisbury converges, since in using 

it, T is diminished even more rapidly than in the following method. 
The last member of (39) may be written: 




















‘i. T = (r'1;¢; — bi)? + Ti, 


where 7'; does not involve c;. Hence if we start with any assumed 
] values whatever for ci, C2, ..., Cn, and then change c; so as to 




















1 Journal of the American Statistical Association, Vol. XXI, 1926, pp. 282-292. 
_? Whittaker and Robinson: “The Calculus of Observations,’’ Sec. 130. 
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make the expression in parentheses vanish; that is, if we replace 
Ci by 
—1' 1202 — T1393 — + * * — Tina + Oi, 


we shall thereby diminish the value of 7’, which we are trying to make 
a minimum; except that if c, already had this value, 7’ is unchanged. 
Then, using the revised values, we replace cz by 


—1'o1C, — T'exC3 — * * * — TenCn + de. 


If this means any change in cz, 7 diminishes still further, and this 
modified value of cz makes the squared term vanish. Continuing 
in this way, each variable in turn is modified so as to diminish T’,, 
or to leave it stationary. The condition that 7 remain stationary 
through an entire cycle of changes in all the variables is that each 
of them already has the indicated value. But in this case (40) holds; 
the solution is already attained. Hence, if the assumed values 
differ from the true ones, 7 will actually diminish in a cycle of sub- 
stitutions, and not remain stationary. 

Let us regard the trial values as coordinates of a point in a space 
of n dimensions. The first modification amounts to moving all such 
points, representing possible sets of trial values, parallel to the axis 
of c, onto the hyperplane 


11,0; — b, = 0. (43) 


The next modification is equivalent to a projection parallel to the 
axis of ¢:, from this onto a new hyperplane, and so on. All these 
hyperplanes pass through a point whose coordinates are the desired 
solution. A complete cycle of the modifications carries the points 
back to the hyperplane (43), which is thus transformed into itself. 
The equation 

T = a, (44) 
where 7’; is the function of cz, . . . , Cn defined above and a is any 
constant, represents an ellipsoid in (43). This, in the tranformation 
of the hyperplane into itself, is carried into a new ellipsoid, whose 
equation may be written 

T;' =a, (45) 


where 7';’ is a new quadratic form in co, . . . , Cn. 

Since each value of 7, and therefore of 7:1, is diminished in a 
cycle of the substitutions, each point of (44) is carried into a point 
within (44). Hence the ellipsoid (45) lies entirely within (44), not 
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even touching it. This geometrical fact supplies us with information 
regarding the roots of F(A) = 0, where F(A) is the discriminant of the 
second-degree terms of the quadratic form 


T,’ om AT. 


These roots are invariant under all linear transformations to which 
both 7; and 7,’ may be subjected. Let us then stretch the space so 
that 7; = a becomes the unit sphere. The roots of F(A) = 0 are then 
the squares of the reciprocals of the semiaxes of the transformed 
ellipsoid 7;’ = a. Since this will lie entirely within the unit sphere, 
all these roots must be greater than unity. Let »’ be the least root. 
Then \’ > 1. Each value a of 7; will, in the course of a cycle of 
substitutions, diminish to a value not exceeding a/d’. Hence, in 
m cycles, the reduced value of 7; will come not to exceed a/d’”, and 
will therefore approach zero. In this way we have a definite proof 
of the convergence of the Seidel process, which may not have been 
previously demonstrated. 

While the value of 7’; is divided by }’ at least in each cycle, it will 
ordinarily be divided by a greater number, since the trial point will 
seldom lie on the longest axis of the stretched ellipsoid. 


10. TESTS AS SAMPLES OF A LARGER AGGREGATE OF TESTS* 


Instead of regarding the analysis of a particular set of tests as our 
ultimate goal, we may look upon these merely as a sample of a hypo- 
thetical larger aggregate of possible tests. Our aim then is to learn 
something of the situation portrayed by the large aggregate. We are 
thus brought to a type of sampling theory quite distinct from that 
which we have heretofore considered. Instead of dealing with the 
degree of instability of functions of the correlations of the observed 
tests arising from the smallness of the number of persons tested, 
regarded as a sample of a larger population of persons, we are now 
concerned with the degree of instability resulting from the limited 
number of tests whose correlations enter into our analysis. The 
theory of this subject is far from complete, but some relevant results 
will be set forth. These results are based on a concept of the tests 
as a random sample from a hypothetical infinite population of possible 
tests. As in other uses of sampling theory, the reservation must be 
made that if the sampling is not random the results cannot be applied 
with accuracy. If for example the tests are deliberately chosen so as 





*The central importance of this concept came to light in discussions with 
Professor Clark V. Hull. 
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to be highly correlated with a particular one of them, or so that a 


group of them will have low correlations with each other, as might be 


done in developing a battery of tests designed to estimate some 
criterion variable with maximum multiple correlation, they cannot 
be called a random sample. 

Our n variables might also be treated as a sample from a larger 
finite population of tests. We might then regard the population of 
tests as resolved into its principal components T,, 2, . . . , and ask 
such questions as to the extent to which 7; is likely to be correlated 
with T;, y2 with T:, ..., and 7, with T,. We should certainly 
wish to know how well the fractions of total variance of the observed 
tests contributed by the principal components represent the corre- 
sponding fractions of the total variance of the population of tests. 
In this as in other connections, the concept of sampling from a finite 
population helps to fix the ideas and make them more tangible, but 
involves mathematics which is more difficult, and I think less generally 
applicable, than arises when we go at once to the limit in increasing 
the population. Using the ‘‘infinite population” concept amounts 
merely to treating our probabilities as independent, and does not 
involve going into the mathematical theory of aggregates or sets, 
which have somewhat the same relation to the hypothetical infinite 
populations of statistics that the pure spaces of analysis situs bear 
to the metrical spaces of ordinary geometry. 

Let the fractions of the total variance contributed by the successive 
principal components in a sample of n tests be 


k k k,, 
fie fram—r--:, aS 


These are the roots of the equation obtained by putting k = nf in 
the characteristic equation (29), p. 429: 


S S 
pp + (Se -(B)r-+ -2+ =Q, (46) 
Here 
S: = DDiis S; = DDiix, ern 
i>j i>j>k 

where 

1 ij 1 Tig Vik 

D;; = >» De=irisl ral---: 
Tij 1 Tri Tri 1 
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It is desirable to show that, by taking a sufficiently large number, n, 
of tests at random, we can make the probability arbitrarily small 
that any of a finite set of the roots will differ by more than a fixed 
amount from definite expected values. This property, if substan- 
tiated, makes the roots fi, fz, . . . consistent statistics, and justifies 
the use of the method of principal components in spite of the arbi- 
trariness of metric mentioned in Section 1, for it implies that sets 
of tests independently devised to study a complex situation may, 
under the random conditions assumed, be relied upon to give similar 
results if only they are sufficiently numerous. While this fundamental 
theorem has not been established with full rigor, its high degree of 
plausibility follows from the following considerations. 

Since each of the correlations 7;; has its entire distribution confined 
between the limits +1, all its moments are finite. The same must 
necessarily be true of every polynomial in the r;;’s._ In particular, 
the determinants D;;, Dij., . . . , and also the powers and products 
of these determinants, being polynomials in the 7;;’s, must all have 
definite finite expectations. Now S, is the sum of 





Cn = Mn — 1)(n — 2) -+ + (n—k+1) 
: 1-2-3---k! 
of these determinants. If we denote the elementary symmetric 
functions of the roots by by’, be’, . . . , so that b,’ is the sum of the 
products of the f’s taken k at a time, then by (46), 
wy = 


Hence },’ has a definite expectation, Hb,’, depending on n, but 
approaching a finite limit 6, as n increases, since C,"/n* approaches a 
finite limit. 

We next show that a value of n may be chosen large enough so 
that b,’ and Eb,’ will differ arbitrarily little in an arbitrarily great 
proportion of samples of n. This follows from the Tchebycheff 
inequality used in establishing various Laws of Great Numbers, as 
soon as we show that the variance of b,’ approaches zero as n increases. 

The variance of b;’ is that of S,, divided by n**. The variance of 
S; is the sum of the variances of the determinants 


ays ss & 


and of double the covariances of pairs of these determinants. The 
number of pairs of such determinants with h common subscript is 
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2 1 n!\ 
Zhi(k — h)'}2(n — 2k + hy)!’ 





; C,"C ba” *C. k _,"-* 


a polynomial in n of degree 2k —h. Forh = 0, all the covariances 
vanish, since those determinants which have no common subscripts 
are entirely independent. ‘The most numerous group of non-vanishing 
covariances is for h = 1, corresponding to determinants with a single 
common subscript. The number of these terms is given by a poly- 
nomial in n of degree 2k — 1. Since the number of terms in each of 
the other groups (kh = 2,3, - - - , n) is given by a polynomial of lower 
degree, and since each term has a fixed value depending only on the 
population, the total variance of S; is of degree 2k —1linn. Thus 
the variance of b,’ vanishes as n-'. Its standard deviation, and there- 
fore, almost always, its deviation from Eb,’, will be of order n~”? for 
large values of n. 

Since the elementary symmetric functions b,’ have definite expec- 
tations about which they cluster with standard deviations of order 
n~*4, the same is true for the power sums 


a’ = Df, (k= 1,2,---+, nm) 


i=l 
for these are polynomials in the b;’. Let 


e. = lim Ee,’. 


n— @ 


If we put 


a= do (kK=1,2,---, ©), 


i=l 
then it appears highly plausible that these equations have solutions 
$1, $2, eee 9 


each between 0 and 1, which we take in descending order of magnitude, 
and which are the population values about which the sample f’s 
cluster. We may expect the ¢’s to satisfy an equation analogous 
to (46), obtained by dividing by the term of highest degree, replacing 
each coefficient by its expectation, and letting n increase: 
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this being now an infinite series. The ¢’s may be described as the 
fractions of the total variance of the population accounted for by its 
principal components. The more rapidly the series 


1 + o2 + ft 7 Poe 


converges to unity, the more definitely do the various abilities tested 
depend upon a small number of underlying characters. The speed of 
convergence may be measured for example by 

ég= oi + goo? +---. 
Each of the e’s will be less than that of next lower order, as is the 
case with the e’’s. The e’’s and e’s are in fact the moments of a 
frequency distribution in sample and population, respectively, and 
the question of the extent to which they, or any finite number of 
them, determine the ¢’s is very similar to the classical moment 
problems on which so much has been done. This transition from the 
moments of the distribution of ¢’s to the distribution itself is all 
that is needed to establish the fundamental theorem of consistency 
of the f’s. 

Without going into the interesting mathematical questions raised 
in this way, we shall be on firm ground in dealing with the e’s and 
their estimation by means of a sample. We shall close this section 
by proving that e,’, as an estimate of «, has a bias of order n-', and 
showing how to correct for this bias. 

From the relations between the power-sums and the elementary 
symmetric functions of the roots of (46) we have: 





From the first of these, 
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Hence if we put 
pao ne.’ — |] 
- n-—-1l1’ 


e, will be an unbiassed estimate of «2, since He, = eg. This bias in 
e;’ means that the fraction f; of the variance contributed by the 
leading principal component of n tests is an exaggeration, of order 
n-1, of the corresponding quantity ¢; in the population. For small 
values of n this correction is very important. In the example of 
Section 5, in which we took only four tests, e2’ is about .38, so that 
e: is only about .17. If the value of ¢: is .17, ¢: cannot possibly be 
so great as the value .465 found for f; in the example, since the square 
of this quantity is by itself greater than .17. 
The unbiassed estimate of ¢; is found similarly, but more labori- 

ously, to be 

peste ne,’ — 3ne.’ + 2 

. (n — 1)(n — 2) 
For the higher degrees it is simpler to work with the elementary 
symmetric functions than with the power-sums. The unbiassed 
estimate of 6; is merely 


b 





ne-1 b : 
." @=~ Da@-D::-a-ktD 
11. PRINCIPAL COMPONENTS WITH PERFECT WEIGHTING 


Consider a finite or infinite population of variates z;, which we 
shall call tests, related to a finite or infinite sequence of variates I; 
which in an infinite population of persons are independently dis- 
tributed with unit variance. Let the relations be 


%ii=anli tanl2+ °°: , (47) 


where the coefficients a;; do not vary in the population of persons, 
but vary independently in the population of tests. Let us assume that, 
for each second subscript j, the a;;’s have the mean value zero and 
variance ¢;. Denoting the mean value of a quantity in the population 
of tests by a prefixed EZ, this means that 


Ea; =0, Eas; = $3, (48) 





and 
Eajjox1 = 0 unless i = k, andj = l. (49) 
The covariance of z; and 2; is 


Dik = i0n1 + Aieaee + °° * = Dei; (50) 
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The variance of 7; is 

Pic= ey taret---:, (51) 
and under these hypotheses of independence among the a’s cannot be 
constrained to be unity. Indeed, these assumptions define a natural 
unit of measure for each variate. If values, expressed in these natural 
units, of a sample of n variates, for each of a sufficient number of 
persons, are available, we may improve upon the method of analysis 
into principal components which we have heretofore considered. 
Under these circumstances, instead of using the matrix of correlations, 
we should apply the same operations to the matrix of covariances. 
The results obtained in this way have so much more elegant interpre- 
tations for the quantities analogous to the moments of the last section, 
giving for example exact standard error formulae, that it is worth 
while to consider their theory in spite of the fact that the absence or 
indefiniteness of natural units in the tests commonly met with in 
practice precludes direct applications at present. Exact standard 
errors for the e,’s of the last section have not been obtained, but 
some idea of their magnitudes may be based upon the exact results 
found for the corresponding quantities in the present section. 

Wherever weights can be applied to tests which may reasonably 
be supposed to transform them approximately into the natural 
units here defined, such weights should be preferred to the equali- 
zation of their variances which we have heretofore considered. The 
considerations of the last section, which indicated that consistent 
results will be obtained under very general conditions when the 
variances are arbitrarily equalized if enough tests are used, suggest 
similarly that any reasonable system of weights will give consistent 
results. 

To make our covariances as much as possible like correlations 
and thus to obtain results suggestive of those appropriate to the 
more common situation, let us take the mean value of p;; as unity. 
Putting Ep;; = 1 in (51) and using the second of (48) gives 


a=-dgtdtoest:::=1 (52) 


These ¢’s resemble but are not identical with those of the last section. 
The same is true of the quantities «, defined by 


a = Doe, (53) 


q=1 


and of the other quantities defined below. 
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In place of (46), the characteristic equation for n tests expressed 
in natural units is 


S S S 
li (& ytd (S\p-* ~~ (\p- + <-+ a @, (54) 
where 
Si = Dp S; = DDii, S83 = Dix me Pee (55) 
inl i>j i>j>k 
the notation now being 

















Pit Dii Pit Piz Dix 
Di; = » Din = Pic Dis Dix, (56) 
Pit Dii Pri Pei Prk 
and so forth. We also put 
e = >i a) (57) 
q=1 , 
the f,’s being the roots of (54). Evidently 
e,' = —~¥ 
-_ 
es! - Si =e 
és! si Si << 38182 aa 383 (58) 





n> 


It is now a straightforward matter to find the expectations of the 
es, their variances, and any desired moments of them, in terms of 
the ¢’s. With the different conditions of the last section this was 
not possible by any means so far discovered, except for the simplest 
cases, the first moments of the e”’s, which made possible the correction 
of bias. 

The moments of the expressions in (58) may be found with the 
help of (55) and (56), from the mean values of the powers and products 
of the p;;. These are obtained by means of (50), (48), (49), and 
whatever assumption is made regarding the higher moments of the 
a;;'s. We may assume that the values of the a;; for a fixed second 
subscript j, are normally distributed; from this it follows that 


Ea’®;; = 0, Ea‘;; = 3Ea*;; = 397;; Ea'‘;; = 0; 
Ea’;; = 15¢°;,- °°. (59) 
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From (58) and (55) we have: 


Ee,’ = = = Ep, = 1; 


while 
ES;? — 2ES, 
n? : 
To evaluate the last expression we need ES,’ and ESz. 


ES?, = Dep’. + 22 Epi 


t=] 


From (51), (59), (48), (52), and (53), 


Ep? = ED aie + 2D 00* gare 
4 


q<r 


- 34%, > 2> deer 
q 


q<r 
= Ben + (e721 — €) 
= 1 + 2ee. 


Also, since p;; and p;; are independent when 7 ¥ j, 
Epispi; = (Epix)(Ep;i) = (Eps)? = 1. 
Substituting in (61), 


Ee,’ = 





ES,;? = n(1 + 2ez) + n(n — 1) = n? + 2nee. 


Now from (50), with i ¥ j, 

Ep’;; = E(Zaieaie)* 

q 

Since ELa;j,a jain jr = 0, when g ¥ r, this equals 

BY cx" igor ig ™ Do eatin 

@ q 
Substituting this result and (62) in the first of (56), 
ED; = 1 — é€. 
Hence from (55), 
ES: = l6n(n = 1)(1 —~ €2). 

Putting this result and (63) in (60) we have 


Ee,’ a 1+ (n + I)eo 
nr 





(60) 


(61) 


(62) 


(63) 


(64) 


(65) 
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Consequently an unbiassed estimate of «2 is 
ne,’ — 1 
rT (66) 


This discounts the sample value e2’, and therefore the fraction of 
variance attributable to the chief component, even more than does 
the corresponding result in the preceding section. 

The calculations required to get the mean values and moments 
of the power sums eé,’ are sufficiently illustrated by the foregoing 
procedure. The expectations of the powers and products of the p;;’s, 
up to the fourth degree, are tabulated at the end of this section. 
Those cases not given explicitly in the table are immediately obvious 
when account is taken of the following principles: 

(a) The expectation of the product of independent quantities is 
the product of their expectations. Thus 


Ep*:p'x1 = (Ep*i;) (Ep*s) = (Ep*i;)? = es, 
where 1, j, k, l, are all different. 

(b) If the sum of the products of the exponents of the p’s by the 
number of appearances in their respective subscripts of any letter is 
odd, the expectation is zero. Thus Ep pi; and Ep, p*;; vanish. 
The reason for this is that, when the p’s are expressed in terms of the 
a’s, each term will contain an odd number of factors with this sub- 
script; their product is independent of the other factors, and will 


have the expectation zero. 
With the help of the table we find by straightforward calculation: 


1 + 3(n + ler + (n? + 3n + Bes (67) 


, 
Ee; = n? 





To obtain an unbiassed estimate e;, of €;, we solve (65) and (67) 
for ¢3; in terms of Ee: and Ee;, drop the symbol E, and replace e; by 
é3. This gives 

ee nes) — 3ne’ + 2 

ont + 3n + 5 





Standard errors and higher moments of these quantities can easily 
be, but have not been, deduced with the help of the table. 

If instead of using the moments « of the ¢-distribution, we use 
the elementary symmetric functions §;, we find the calculations 
somewhat simpler. The results are of course equivalent. We find 
as the unbiassed estimate of f;, 
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The variances of b; and bz are: 
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n 2 je 4 2e3 - 4) + mar pa Tae 4e> + Ses + 6e*, = 10). 





o*,, = 





It should of course be remembered that these specific results are 
all based on the assumed normality of distribution of the coefficients 
of each component I in the population. Any other specific assump- 
tion as to these distributions would give analogous but not identical 


results. 
TaBLeE AssuMING NoRMAL DISTRIBUTION OF THE aj; 
i, j, k Are All Unequal 
First degree: 
Epu = 1. Epi; = 0. 
Second degree: 
Ep? = 1 + 2es. Ep*;; = ¢2. 
Third degree: 
Ep’ = 1 + Gee + Ves. Epps; = €2 + 2e. 
Epi; = 0. EpisPixPri = 4. 
Fourth degree: 


Ep'i: = 1 + Ges + 18¢*s + 446 + 376. 
Epps; = €2 + 2e*s + des + Sey. 
Epiupjipris = €2 + Ses + 4a. 

Ep'i; = 36%. + 6a. 

Ep* pin = 82 + 2a 

EpiipisPiePik = € + Ze. 

Epipijp ie = €2 + 26. 

EpispjxPriPui = €- 


12. THE ‘‘SAND”’ AND ‘‘COBBLESTONE”’ THEORIES OF THE MIND 





If a few mental characters such as general intelligence, cleverness, 
etc., are sufficient to account for virtually all the variance among 
individuals in all kinds of performances, we have a radically different 
situation from that of a large number of independent characters 
which all make small contributions to the variance. To distinguish 
between these two conditions, which have sometimes been referred 
to as the ‘‘cobblestone”’ and the ‘‘sand’”’ theories, was one of the 
original objects in the analysis of mental tests. 
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One criterion which might be employed to distinguish between 
these two theories is that of normality of distribution of the scores 
on a test. If the score is the linear resultant of a large number of 
independent variates making approximately equal contributions to 
the variance, the second limit theorem of probability shows that a 
normal distribution is to be expected. However this criterion is 
somewhat difficult to apply, for various reasons. Even on the cobble- 
stone theory, we might have a normal distribution, since each of 
the ‘‘cobblestones”’ might well itself be a normally distributed variate, 
so that the test is not decisive. Actually many distributions of scores 
are very far from normal, but this can often be ascribed to the very 
arbitrary units used; it is then customary deliberately to transform 
the variate into one of normal distribution. 

Far more delicate distinctions are possible when only the corre- 
lations among the tests, and not the higher moments, are the basis 
of the calculations. The fundamental consideration then is that on 
the ‘‘sand’”’ theory we should expect low correlations between pairs of 
tests, while on the ‘‘cobblestone”’ theory we anticipate high ones. 
This can be made more definite with the help of the ideas used in the 
last section, though in applying the distinctions we do not find it 
necessary to assume that the ‘“‘natural units” there indicated are 
known, or to base an analysis upon covariances instead of unweighted 
correlations. The notation used in this section has the same meaning 
as in Section 11. 

The primary question is as to the rapidity of convergence of 
the series (52), which may have a finite or an infinite number of terms. 
According to the ‘‘sand”’ theory the series should converge slowly, 
having no large terms, but a great number with approximately equal 
magnitudes, and larger than the rest. The ‘“‘cobblestones”’ of the 
other theory may be interpreted as a few large ¢’s at the beginning of 
the series which account for nearly the whole of its value. There are, 
of course, infinitely many variations of the compromises between 
the views thus roughly described. 

Taking the special form of the ‘“‘sand” theory in which the first 
m of the ¢’s are equal and the rest zero, we may from our data estimate 
m and infer reasonable upper and lower limits for it. Indeed, from 
(50) and (51), the correlation of the ith and pth tests is 
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. each of the summations being with respect to j and extending, under 
the theory we are now considering, from 1 to m. 

Now (68) is also the expression for the correlation coefficient which 
would be computed from the independent quantities 


Git Gig . . « Aim 
Qpl Ap2 ee ee Apm 


if the usual definition of the correlation coefficient were varied by 
omitting the requirement that the sample means be eliminated. With- 
out this requirement, the distribution of the correlation coefficient 
in samples of m is the same as the distribution of the correlation 
coefficient as usually defined in samples of m + 1, provided that, 
as we have assumed for the a;,’s, the quantities are normally and 
independently distributed about zero. This is evident from the same 
geometrical situation which led R. A. Fisher to the discovery of the 
distribution of the correlation coefficient.: The assumption that 
our tests are taken independently at random from the infinite aggregate 
implies that the population value of (68) is zero. Specific types of 
non-random selection of tests could be treated with the help of the 
distribution of r corresponding to non-vanishing values of the corre- 
lation in the population. However, on the assumption of random- 
ness, we take the simplest case of Fisher’s distribution, which, on 
putting m + 1 for the sample number, becomes: 


1 rym) (1— 12) 4omBqy, (69) 


Vx T[}4(m — 1)] 

The usual use of the sampling distribution of r is to determine 
whether, for a given sample size, the correlation is significantly greater _ 
than zero, or to find the greatest or least plausible true value of the 
correlation corresponding to a given probability of a greater dis- 
crepancy. But we shall make a different use of (69). Instead of 
knowing m and ascertaining whether r is excessive, we now consider 
that we know 7, and wish to find the greatest value of m which, for a 
given level of probability, can be regarded as plausible. Fora random 
pair of tests, the criterion is that the greatest acceptable value of 
m shall be that which reduces to some standard probability P, such 
as .05, the value of the integral of (69) outside of symmetrically placed 
limits which are both equal in absolute value to the sample correlation. 
The application of this criterion is very simple. For small samples, 











1 Biometrika, Vol. X, 1915, p. 507. 
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we use R. A. Fisher’s Table V.A. in his Statistical Methods for Research 
Workers, which gives the values of r beyond which the integral of 
(69) takes the values .01, .02, .05 and .10, for various sample sizes. 
In this table, n is less by 2 than the sample size. To obtain m, we 
therefore, after finding the approximate value of our observed corre- 
lation in the column corresponding to our standard probability P 
(e.g. .05), increase the value of n given opposite it by unity. 

For sufficiently large values of m, the distribution (69) approxi- 
mates the normal form. Its variance, easily found by multiplying 
by r? and integrating, is exactly 2 = 1/m. Hence the limiting form 
of (69) is 


(terrae. (70) 


Instead of using Fisher’s table we may, if m is large enough—that 
is, if r is small enough—use the fact that the probability .05 of a 
greater deviation corresponds, for the normal distribution, to the 
deviation 1.96c. Putting this equal to the observed correlation r, 


we have’ 
_ 1 _ (1.96)* _ 3.84 
ee ee ee ae 


as the greatest acceptable value of m. 

A minimum acceptable value of m can be found similarly, by 
requiring that the integral of (69) between the observed correlation 
and its negative as limits shall have the standard value P. However 
in this case we cannot use Fisher’s table. Neither can we use the 
assumption of a normal distribution, unless r is very small, because 
the small values of m which must ordinarily correspond to minimum 
values make (69) far from normal. The simplest solution of this 
problem seems to be through trial values of m, expanding (69) in a 
series of powers of r? and integrating term by term. But it is only for 
correlations umerically less than P(= .05, say) that we can infer 
that m must be as high as 3. For the mental tests we have used for 
illustration, no corrected correlation so small as .05 occurs. 

Between these upper and lower limits, the value of m may be 
estimated by the principle of maximum likelihood, which amounts 
to choosing m so to maximize (69). If in place of (69) we maximize 
its approximate value (70), our estimate is simply 
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A more accurate estimate may be made with the help of a number 
of independent correlations, r1, rz, . . . , Tp. Their joint distribution 
is obtained upon multiplying together expressions like (69). Multiply- 
ing together the approximate values (70) instead, taking the logarithm, 
and omitting terms which do not involve m, we have 


L = Vp log m — Lom Sr 


i=l 
which may be defined as the likelihood of m. It is a maximum for 
m= 5. (71) 


The accuracy of this estimate, at any rate when p is large, is expressed 
by its variance, which is approximately! the negative reciprocal of the 
second derivative of L; that is 

2m? 


Cn? a 


Pp 

Now the correlations of any number of variates with a single 
variate are independent of each other, if all the correlations are 
based on samples of a given size from a normal distribution. This is 
geometrically evident when it is recalled that the correlation coefficient 
is the cosine of the angle between two random lines in hyperspace, 
and therefore of the angular distance between random points on a 
hypersphere. If one point is fixed, the distances of the others from 
it are obviously independent of each other. 

From (71), applied to the correlations of the first with the other 
three variates given at the beginning of Section 5, we have as an 


estimate of m, 
3 


(.698? + .264? + .0817) 


To obtain an estimate of m based on the whole set of observed 
correlations, and not merely independent ones, we must consider the 
simultaneous distribution of all the correlations among n variates in 
samples of N(= m+ 1) from a normal population in which all 
correlations are zero. One way to obtain this is by the use of the 
partial correlations. 





= 5.5. 





1 Fisher, R. A.: On the Mathematical Foundations of Theoretical Statistics. 
Phil. Trans., Vol. CCXXITIA, 1922, p. 328; H. Hotelling: The Consistency and 
Ultimate Distribution of Optimum Statistics. Trans. Amer. Math. Soc., Vol. 
XXXII, 1930, pp. 847-859. 











Analysis of a Complex of Statistical Variables 519 Be 


ber } 
3 Te3.1y T24.1y + + + 4 Ten.1y 
10n 
734.12) 735.12) - - + » Ton.12) 
ly- 
im, 


Ta-ly n-123 + + + n—2; 


which are cosines of angles between planes and hyperplanes deter- oi 
mined by the random lines mentioned above, and are independent of 13; 
each other and of rj2, riz, . . - » Tine Each of these has the distribu- 
tion (69), with the value of m diminished for each partial correlation 
by the number of variates eliminated. Upon multiplying all these 

1) together, an expression is obtained whose general form, suggested for 
n = 3 may be established by mathematical induction. Putting 
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The integral of (73) over all possible values of the r’s must of i, 

course be unity. This will be true also if N is replaced by N + 2k. Sat 
* A ready means is thus found for evaluating the integral of the product a 
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in This last result was reached in a different manner by S. S. Wilks, 

ll who proceeded to derive the exact distribution of w, which for n = 3 i 

1€ is expressed as a multiple of a hypergeometric function, and for . 
higher values of n as a complicated multiple integral. 

. The result (73) may also be reached by starting from J. Wishart’s 

id distribution? of the sample covariances a;;, putting Qj; = 14;8;8;, and 

i. integrating with respect to all the s’s from 0 to infinity. 
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1 Biometrika, Vol. XXIV, 1932, p. 492. 
2 Biometrika, Vol. XXA, 1928, p. 38, formula (9). 
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If the observed correlations have been fully resolved into their 
principal components, the value 


w=kike---: ky 


is easily computed. On the ‘‘sand” theory, the correlations should 
tend to vanish, and w should therefore be near unity. (w necessarily 
lies between 0and 1.) A value near zero might arise with high correla- 
tions. A zero value would imply a number of independent components 
less than the number of tests, some of the k’s then vanishing. 

The product of the k’s for the example of Section 5 is .235. We 
may inquire what value of m(= N — 1), with this value of w, will 
make (73) a maximum. Since n = 4, we are to maximize 


"(2) a 
_ 
m—1 m—2 m— 
a) a) 
An approximate solution is obtained by equating the values taken by 


this expression when m — 1 and m + 1 are put for m. This leads to 
the cubic 





(m — 2)(m — 3)(m — 4) = 8u, 


which for w = .235 has the single real root 6.50. The actual maximiz- 
ing value found by trial calculations from Legendre’s table of the 
gamma function, after interpolating parabolically from the values 
m = 6, 7, 8, is about 6.8. 

The hypothesis of m or more principal components in the population 
of tests, making equal contributions to the total variance, while the 
remaining components are comparatively negligible, can be tested 
accurately with the help of w, provided a way can be found to integrate 
the distribution discovered by Wilks, from 0 tow. If the value of this 
integral is very small, we should reject the assumed value of m in 
favor of asmaller one. Likewise we could set an upper limit to plausi- 
ble values of m by integrating from the observed value of wto1. Since 
no expression for the integral in manageable form, and no tables, are 
now available, we must in such questions be content for the present 
either with such information as can be obtained from the moments 
(74), or from the use of Fisher’s table with individual correlations, or 
from our two kinds of maximum likelihood estimates, corresponding 
respectively to independent and to complete sets of correlations. 
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THE FACTOR THEORY AND ITS TROUBLES: 


II. GARBLING THE EVIDENCE 
C. SPEARMAN 


I. PRESENT CALL FOR REVIEW OF SITUATION 


On various counts the present moment appears to be singularly 
opportune for reviewing the situation of the theory of “‘T wo Factors.” 

One ground for such a review—and excuse for its being undertaken 
by the present writer—is afforded by his exceptional good fortune in 
that he has for some years been teaching successively at several of 
the most important centers in America for the study of such theories. 
At each of these great centers there have been held systematic con- 
ferences about the main points at issue. Among the participants 
of the conferences h»*e been included, not only many of the leading 
experts of the prese«”’ day, but also a large number of the younger 
generation who are likely to lead in days to come. After this fashion 
he has enjoyed unique opportunities for making himself acquainted 
with, and deriving benefit from, the most weighty views now current 
on the topic. To all those who have rendered this yeoman’s service 
he hereby again tenders his cordial thanks. 

Another reason for undertaking this review of the situation is the 
encouraging fact that the said theory, together with those cognate 
to it, have—under the inspiration of Professor Thorndike—arrived 
at a stage of intimate collaboration between many investigators, 
not only in America, but also in most other principal countries. 
™verywhere psychologists are being incited to present their views on 
the problem of determining the most fundamental characteristics 
of an individual. Based on such assembled contributions, and with 
the hope of utilizing the most valuable elements in them, plans are 
being made for very extensive and intensive further investigations. 

Above all, the generosity of the University of Chicago—in par- 
ticular, its Department of Education—have supplied to Professor 
Holzinger and myself an opportunity for beginning to put these plans 
into actual execution. We are now well under weigh with what would 
seem to be the most systematic and comprehensive study of the whole 
topic that has so far been effected. 

An additional, but unfortunately not negligible motive for the 
present review derives from the extraordinary way in which the theory 
of Two Factors has been misunderstood and misrepresented. The 
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version of it given in this and the following contributions will, natu- 
rally, be in the first place that of the present writer at the moment of 
writing. Still he does believe that, as a matter of history, his changes 
of doctrine during the thirty years or so that have lapsed since its 
original formulation have involved little or no swervings to the right 
or to the left, but only straightforward advance. As much can be 
said, it appears, of nearly all other authors who have approached 
the topic in any constructive manner; we all seem to have moved 
together as a solid phalanx. 


II, CITATION OF IRRELEVANT DEFECTS 


In general, the series of articles constituting this review begins by 
considering the chief ways in which any critics have been led to think 
that the Theory is not confirmed by actual observation. One such 
way has been treated already in a previous article, which thus repre- 
sents the first of the series; it is entitled ‘‘ Pitfalls in the Use of ‘Probable 
Errors’ ”’ (this Journal, 1932). 

The present and second article will briefly deal with what may be 
designated as garbling the evidence; that is to.say, picking out only 
those portions or aspects of it which suit the writer’s purpose. From 
such a tendency not even the most competent and judicious authorities 
would seem to have been quite free; as is shown by the traces of it 
that have appeared even in the recent Treatise of Professor Garrett 
and Dr. Anastasi.* | 

In Part II of Treatise (Part I was the chief topic of the above 
mentioned “ Pitfalls etc.””) two particular studies, those of Bonser and 
of Simpson, were singled out for very severe criticism in certain 
respects. And this adverse verdict was taken as reproach against the 
Two Factor theory in general. 

Surely it was something less than justice on their part to give to 
their readers no hint that both the said studies charged against the 
supporters of the theory did not really proceed from these supporters 
at all, but on the contrary from the school most opposed to it. The 
interest of the supporters was limited to showing that even these studies, 
which were being used for a violent polemic against the theory, did 
in truth agree with it. 

But waiving such matters of more ethical than scientific moment, 
and for argument’s sake conceding that these or any other studies do 


*“The Tetrad-Difference Criterion and the Measuremen: of Mental Traits.” 
Published in Annals of the New York Academy of Sciences, 1932. 
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suffer from certain grave defects, we have to face the general problem 
of method. Can or cannot we forthwith reject the testimony of such 
studies with respect to any such theory? I suggest in answer that 
we can fairly do so when, and only when, the defects and the theory 
are inter-related. And no such relation, it seems to me, has ever been 
brought forward by the class of critics that we are now considering. 
Thus, the chief defects brought against Bonser and Simpson are that 
some of their averages are made to include widely different individual 
values, that the distribution of some of their measurements is bi-modal, 
and some of the correlations may fail to reach a high reliability. But 
nowhere are we told what influence such faults may be expected to 
exercise upon the tetrads, which supply the criterion of the Two 
Factor theory. In point of fact, this criterion is satisfied with remark- 
able precision. The observed median tetrads were in the two cases 
respectively .062 and .013, whilst the values demanded by the theory 
were .061 and .011. 


III. SUPPRESSION OF BEST EVIDENCE 


Even graver than such restriction of the regard of critics to certain 
limited and even irrelevant aspects of the studies at issue is the fact 
that these studies themselves may be only a small selection out 
of a comparatively large number available. Is this selectiveness 
legitimate? 

As an example of the sort of case here in mind, the reproach has 
repeatedly been urged in a work of Kelley' that many studies support- 
ing the theory of Two Factors have employed too small samples. The 
fact also some very large samples had been used was left by him out of 
account; indeed, it was not so much as mentioned. Was he really 
justified in thus confining his scope to only a portion of the available 
evidence? 

To answer this question, we must remember that compound 
evidence is of two different kinds. In the one, the components stand 
in prolongation of each other. Thus, if A is B, Bis C, and C is D, then 
this triple-linked chain leads to the conclusion that A is D. Insucha 
case the strength of the chain is at least as weak as that of its weakest 
component; and then, undeniably, a critic can afford to leave out of 
account the other components, however strong. But in the other 
kind of compound evidence, matters are otherwise. Here the com- 





“Crossroads in the Mind of Man.” 1928. 
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ponents run, not in prolongation of each other, but side by side. Thus 
three different observations may each of them independently afford 
evidence for some conclusion. Here, the strength of the whole evi- 
dence is at least as strong as that of the strongest link. To leave the 
latter out of account would be absurd. 

Now, it would seem that the testimony afforded by different 
studies is usually of the latter kind, where the respective evidences 
run side by side independently. In such cases, the crucial evidence 
for or against a theory must be sought in the best studies. Studies 
based on small samples or otherwise inferior can only play a subsidiary 
part (though by no means always a negligible one, especially in the 
case of numerous small samples). 

Accordingly, the critics we are considering must confront a decisive 
question. Are or not the studies of Bonser and Simpson—with all 
their faults real or alleged—the best that have ever been brought 
to bear on the theory of Two Factors? Surely to reply in the affirma- 
tive would be quite irrational. Among other things, both these 
studies are more than twenty years old! 

At the present moment, Garrett himself makes no such claim for 
them. Far otherwise, he writes to me as follows: 


My criticism of the work which you cited in your “ Abilities of Man”’ is not of 
the more recent and more complete work of Stephenson and others, but rather of 
earlier and what I feel to be inconclusive studies, i.e., Bonser, Simpson, etc., Nov. 
4th, 1932. 


Still more positively he concludes: 


I am certain of the presence of a general factor in most tests of the “intelli- 
gence” sort. Ibidem. 


Accordingly, not having in his aforesaid Treatise mentioned the 
work of “Stephenson and others,” nor even his own acceptance of 
the general factor, he will be glad that these momentous supple- 
ments to his Treatise should now be made generally known. 
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THE INFLUENCE OF MANUAL TRACING ON THE 
LEARNING OF SIMPLE WORDS IN THE CASE OF 
SUBNORMAL BOYS! 


SAMUEL A. KIRK 
Psychological Laboratory, University of Chicago 


The conventional method of teaching reading to the normal child 
has proved effective except in certain cases of reading disabilities in 
which it has failed to produce results. Dr. Fernald discovered that 
certain cases which were referred to the clinic as mentally retarded 
were not mentally retarded according to psychological tests, but were 
unable to learn to read after three or more years in the public schools. 
During a period of five years she found seven such cases who were of 
normal intelligence but who were unable to learn to read by the 
ordinary school method. After several weeks of training by the 
conventional “sight’’ method with individual instruction in the clinic, 
these children showed no improvement. When the method was 
changed and the child was required to trace the words with his fingers, 
and later to reproduce them in writing by memory, he learned very 
rapidly. After some simple words were learned by this method, 
simple sentences were taught by the same method and in a relatively 
short time these children learned to read. She concluded: 


Perhaps we can go no further in theory than to say that, in the specific cases 
studied, lip and hand kinaesthetic elements seem to be the essential link between 
the visual cue and the various associations which give it word meaning. In other 
words, it seems to be necessary for the child to develop a certain kinaesthetic 
background before he can apperceive the visual sensations for which the printed 
words form the stimulus. Even the associations between the spoken and the 
printed word seem not to be fixed without the kinaesthetic link.* 


At the present writing, remedial teaching for cases of reading 
disabilities has taken into consideration many factors, such as indi- 
vidual instruction, phonics, tracing words or copying them, and other 
factors which influence the speed of learning. We do not know 





1 This study was conducted under the direction of Dr. A. G. Bills, of the 
Department of Psychology. Acknowledgments are also due to Illinois Institute 
for Juvenile Research for their codperation and their permission to carry on the 
experiment at the Oaks School. 

? Fernald, G. M., and H. Keller: The Effect of Kinaesthetic Factors in the 
Development of Word Recognition in the Case of Non-readers. J. Ed. Research, 
Vol. IV, 1921, pp. 355-377. 
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whether the success of remedial teaching in cases of reading disabilities 
as proposed by Dr. Marion Monroe! is due to the kinaesthetic element, 
to phonics, to individual instruction, to other factors, or to all com- 
bined; nor do we know the relative efficacy of each factor. We do 
know, however, that remedial teaching as practiced in the various 
clinics, especially at the Illinois Institute for Juvenile Research, under 
the direction of Dr. Monroe, has proved very successful in teaching 
some children to read after the public schools have failed to teach 
them. 

Since many subnormal children have difficulty in learning to 
read, the problem is to analyse the factors in learning for these children 
so that the methods will be more effective in the future. The kinaes- 
thetic or tracing factor is believed to be effective; yet no one has experi- 
mentally demonstrated its efficacy. 

Dr. Fernald? ascribes the success of her teaching, in the cases 
reported in her article, to the hand kinaesthetic element; but since 
many other factors were involved, and since her method was not 
experimentally compared with the conventional method, she could 
not definitely state that tracing the word aided in learning the word. 
She inferred, however, that the factor of tracing was the connecting 
link between the visual cue and the various associations which gave 
it word meaning. 


—_—- re tt, A fttlU CLD 


PROBLEM 


Our problem is to compare, experimentally, the relative efficacy 
of the manual tracing (kinaesthetic) method with the conventional 
“sight ’’ method in the case of subnormal boys. 


SUBJECTS AND MATERIALS 





1. Subjects —The subjects used in the experiment were six boys 
of ages 9-1 to 11-3, who had mental ages of 6-3 to 8-1, and whose IQ’s 
ranged from 63 to 80. None of the subjects could score on any 
standardized first grade test in reading. Table I shows the chronologi- 
cal ages, mental ages, intelligence quotients, and grade of the subjects. 

2. Materials Learned.—The materials learned were 150 commonly 
known three-letter words, printed in common script on four- by six- 
inch cards. The words were selected by tabulating all three-letter 





1 Methods of Diagnosis and Treatment of Cases of Reading Disabilities. 
Gen. Psychol. Mon., Nos. 4 and 5, 1928. 
2 Loc. cit., 376f. 
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Taste I.—Ssowine CA, MA, IQ, anp Grape or Sussects 








Subjects CA MA Io Grade 
Ba. 10-2 8-1 .80 Below I 
Cl. 9-7 7-5 .78 Below I 
Ro. 9-1 6-3 .69 Below I 
Go. 10-0 7-2 .72 Below I 
Br. 9-7 6-9 .72 Below I 
Ca. 11-3 7-1 .63 Below I 

















words and eliminating those whose meanings were abstract. A cross 
section of the words selected is: ask, new, eat, pot, tin, ham, win, pig, 
cap, hog, etc. 


METHOD OF EXPERIMENTATION 


The method of equating the lists of words was as follows: The 
words were printed in script on separate cards, shuffled, and a list of 
five words was drawn from the pack of cards each day to exclude, as 
much as possible, the chance factor of having more difficult words in 
one list than in another. Thirty lists of five words each were drawn 
from the cards during the course of the experiment, fifteen of which 
served as lists for method “A” and fifteen for method “B.” The 
methods will be described below. 

After some preliminary testing it was found that, for our subjects, 
a list of five words could be learned without prolonged effort, and that 
the presentation time for both methods should be seven seconds per 
word. 

A. The Conventional “ Sight” Method.—By this method the subject 
was presented with a card with one of the words on it. He was told 
the word and was asked to look at it and sayit. At first a “‘test series” 
was given; that is, each word was exposed for a five second interval, 
testing the subject to determine how many words he knew initially, 
or in the case of learning (after the presentations) how many of the 
words he recalled. This was done by exposing the word and saying, 
“What is this word?” If the subject knew any of the words on the 
initial “‘test series’? they were excluded from his list and unknown 
words were given as substitutes. (Incidently only a few words were 
recognized by several of the subjects during the course of the experi- 
ment.) After the initial ‘‘test series” a presentation of seven seconds 
per word followed. The procedure of presenting the words for learning 





po teat nondiamne ovcmeender eee ~ map nan fre neem perenne we omen = - 
= ee mecca “~ ~ os agen ane —— = ah : ° —— 
- - - = : g - a —_ 
— - ~ ~~ ads es a oe —s nie - ‘ i . 
ss, yh ea 4 eS Seg 4 “4 : 4 hy 

. A aaa, os * a ae “i : <a c f : Car. ¢ P 
3 ‘ Ae oe iS ates a teeth $ wheal as Live in. “ter Toe PY aad a cae a a 

— ;_ Sn ee eS = : , . 


Ce ES PL eee See semen ty oe Noes ene re 
ee ee er ee S a . 








528 The Journal of Educational Psychology 


was as follows: The subject was seated opposite the experimenter. 
The experimenter presented a word and said slowly, “Look at this 
word; this word is ‘dog’.’”’ The subject responded “dog.” On the 
preliminary tests the subject was trained to respond immediately after 
the experimenter had spoken the word, but during the experiment it 
was not necessary to tell the subject to repeat the word, for he quickly 
became familiar with the method and responded immediately after 
he heard the word. The time for each presentation was seven seconds 
which gave the subject sufficient time to look at the word, hear it, and 
then repeat it once. The time was kept with a stop watch. Succes- 
sive alternations of presentations for learning, and ‘‘test series” for 
recording the number of words recalled after each presentation, were 
given, until the subject learned the list of five words to one perfect 
repetition of the series. The number of presentations for learning 
was recorded as the measure of learning. 

The words were presented in mixed order by shuffling the five cards 
before each ‘‘test series” and before each presentation. Twenty-four 
hours later, the subject relearned the list of five words by the same 
method. The recall and the relearning trials were recorded as the 
measures of retention. 

Although the “first recall’? may have been sufficient as a measure 
of retention it was thought that an additional check should be made; 
therefore the list was relearned and the “savings score,’’ which was 


derived from the formula: 1 — Relearning trials 


Learning trials ’ 
was calculated. 

B. The Manual Tracing (Kinaesthetic) Method.—In method “ A,” 
described above, the subject looked at, heard, and said the word, 
while in the manual tracing method a fourth factor, namely tracing 
the word with a dull pencil, was added. On the preliminary tests 
the subjects were asked to trace the word with a certain speed and they 
retained that speed throughout the experiment. It was found that 
they could be trained to look at, hear, say, and trace the word in seven 
seconds with ease. As in method “A” the subjects were first pre- 
sented with a “test series,” then a presentation. The method of 
presentation was as follows: The experimenter laid the card on the 
table as in method “A,” and having given the subject a dull pencil, 
said, “This word is ‘cat.’ Say it and trace it.”” After the preliminary 
tests the subjects knew what was required of them and the experimenter 
said only, “This word is ‘cat.’ Trace it.’’ The subjects repeated 
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the word and traced it. Similarly, the time for each presentation was 
seven seconds and the recall or ‘‘test series” was five seconds. The 
words were presented in mixed order and the criterion of learning was 
one perfect repetition of the series. The trials to learn were recorded 
as the measure of learning. On the following day the words recalled 
and the number of trials to relearn by the same method were recorded 
as the measures of retention. This method will be called method “ B.”’ 

Method “A” and method “B” were used on the same subject on 
alternate days. Two practice trials to familiarize the subjects with 
the different methods were given and not included in the final data. 
On the first day three subjects learned a list of five words by method 
“A,” and the other three learned the same list by method “B.” On 
the second day retention was measured by the same method that was 
used in learning, and another list of five words, selected at random from 
the cards, was learned by the other method. In other words, three 
subjects learned on successive days by methods “A,” “B,” “A,” 
“B,” etc., while the other three learned the same list in the following 
order, “B,” “A,” “B,” “A,” etc., until each subject had learned 
fifteen lists of five words each by method “‘A” and fifteen lists of five 
words each by method “B.” Subject Ca. was forced to discontinue 
because of illness after the twenty-fourth practice period, giving him 
twelve lists for each method. 

An attempt has been made to compare two different methods, the 
conventional “‘sight”’ method, which is sometimes called the “‘look 
and say” method, with the manual tracing method, commonly 
referred to as the “kinaesthetic” method. The writer has avoided 
the concept of ‘‘kinaesthesis’”’ because it has been used rather loosely, 
implying that kinaesthesis alone is the predominating factor; whereas, 
actually, many other factors enter in, such as cutaneous and other 
somesthetic factors which cannot be isolated in the act of tracing or 
writing a word. The cutaneous factor has been minimized in this 
experiment as much as possible by requiring the subjects to trace the 
words with a pencil rather than with their fingers. Furthermore, in 
order to approximate a life situation, the words were printed in normal 
sized script rather than exaggerated print, the use of which obviously 
involves more somesthetic activity. 


RESULTS 


1. Individual Results.—Individual results, showing a comparison 
of the manual tracing method “B” and the conventional “sight” 
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method “A,” in terms of (1) trials to learn, (2) retention by means 
of “first recall,’ and (3) retention by means of the per cent of saving 
in relearning or the “savings score,” will be given below in Tables II, 
III, and IV, respectively. The purpose of this section will be to show 
the general tendency for each subject when the average of fifteen 
practice periods by each method is calculated. The probable errors 
were calculated by the formula: 


V/(PEm)* + (PEm)? — 27uPEoiPEna! 


TasB.eE II.—SxHowine AVERAGE PERFORMANCE OF INDIVIDUAL SuBJECTS IN TERMS 
or TRIALS TO LEARN 














Methods 
Subjects Do 
M, Ma 
Ba 7.61 8.33 — .72 
Cl. 7.53 5.60 + .93 
Ro. 5.86 5.54 + .32 
Go. 4.87 5.27 — .40 
Br. 6.33 8.20 —1.87 
Ca. 8.33 7.00 +1.33 














Table II shows the mean number of trials for each subject after 
fifteen practice periods by each method. M, refers to the mean 
number of trials for each subject when the manual tracing method was 
used, and M, to the average number of trials for each subject when the 
conventional ‘‘sight”’ method was used. D,_, is the difference between 
the two means. The results indicate that one-half of the subjects 
required fewer trials to learn by the conventional method and the other 
half required fewer trials to learn by the manual tracing method. No 
conclusions can be derived from the data in terms of trials to learn, 
since the efficacy of either method varied according to the subject. 
The differences, however, are probably due to chance factors. 

Table III shows a comparison of the two methods in terms of the 
number of words recalled after a period of twenty-four hours. The 
headings of the first three columns are the same as in Table II. 
The fourth column, ‘“PE,i;,’’ is the probable error of the difference of 
the two means. Under “ratio” we have calculated the ratio of the dif- 





1 Holzinger, K. J.: ‘‘Statistical Methods for Students of Education.” Boston, 
Ginn, and Co., 1928, p. 243. 
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Methods 
Subjects Des PE gite. Ratio 
M, M, 
Ba. 2.467 1.200 1. 267 . 235 5.4 
Cl. 2.000 1.533 .467 . 237 2.0 
Ro. 2.800 2.267 .533 . 247 2.2 
Go. 1.533 1.333 . 200 .135 1.5 
Br. . 867 . 533 . 334 .152 2.2 
Ca. 1.500 1.167 .333 .330 1.0 
ference of the two means to its probable error. The results show that 


every subject, without exception, retained a greater number of words 
when the manual tracing method was used, although the differences 
were not significant for most of the subjects. The purpose of the table 
was to show the consistency of the results among the subjects. 


TaBLE IV.—SHOwWING AVERAGE RETENTION OF INDIVIDUAL SuBJEcTs IN TERMS OF 
THE PERCENTAGE OF SAVINGS BY RELEARNING 











Methods 
Subjects Do. PE gitt. Ratio 
M, Ma. 
Ba. 75.40 58.65 16.75 4.91 3.41 
Cl. 76.50 52.71 23.79 5.54 4.20 
Ro. 77.05 63.03 14.02 3.81 3.60 
Go. 58.68 54.32 4.36 5.40 .80 
Br. 67.51 54.62 12.89 5.46 2.36 
Ca. 70.48 54.54 15.94 6.75 2.30 




















The column headings in Table IV are the same as in Table III. 
Again, the data indicates that retention is greater when the manual 
tracing method was used, since every subject tended to have a greater 
retention score by the savings method, although the ratios of the differ- 
ence of the means to their PE’s are not all significant. This table 
shows the general tendency of each subject, and the absence of incon- 
sistencies among the subjects in the retention scores. 

2. Group Results.—Tables II, III, and IV compare the two methods 
for individual subjects. In this section the data will be used col- 
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lectively to determine the average results for the group as a whole. 
The graphs below will show the average performance of the group 


Avérege Trials to Learn 
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e----- Conventional *Sight® Method 
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with eighty seven practice periods by each of the two methods in terms 
of, (1) trials to learn, (2) retention by means of the number of words 


er of worde recelled 


recalled, and (3) retention by means of the per cent of savings by 
relearning. Graphs 1, 2, and 3 were smoothed by calculating the 
average for the group for two day intervals. 
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Figure 1 is a graphical comparison of the manual tracing and the 
conventional methods in terms of learning trials for the group as a 
whole. The mean for the manual tracing method was 6.55, whereas 
the mean for the conventional “‘sight’’? method was 6.40. The in- 
significant difference of .15 in favor of the conventional method indi- 
cates that there is practically no difference in terms of trials to learn 
between the two methods. 

Figure 2 is a graphical comparison of the manual tracing and the 
conventional methods in terms of retention by means of the average 
number of words recalled. 


Mean of manual tracing method (B).....................-- 1.874 
Mean of conventional method (A)...............c00cceees 1.345 
ee OP re st rc oe weebcce ee deueeeaen .529 
OO Ae, be le wh pues code ertearviaedeens .084 


The ratio of the difference of the two means to the probable error of 
the two means is 6.3 in favor of the manual tracing method. It is 
obvious, from the graph and the accompanying figures, that, in terms of 
the number of words recalled, the manual tracing method is superior to 
the conventional “‘sight’’ method. 

Figure 3 is a graphical comparison of the manual tracing and the 
conventional “‘sight’’ method in terms of retention by means of the 
per cent of savings by relearning. 


Mean of manual tracing method (B).............. 70.95 per cent 
Mean of conventional method (A)................ 56.37 per cent 
EE ee cence bcc tcencees anv eeee 14.58 per cent 
WS a fecaakwececeoéscbecnesaces 2.05 


The ratio of the difference of the two means to its probable error is 
therefore, 7.1, which is considered significant in favor of the manual 
tracing method. 


SUMMARY OF RESULTS 


I. Tables II, III, and IV show in a comparative manner the aver- 
age individual performance of the subjects by the manual tracing 
method, and by the conventional “‘sight’”’ method. 

1. There is a great variability among the subjects in terms of trials 
to learn, since some of the subjects had fewer trials to learn by the 
conventional method, while others had fewer trials to learn by the 
manual tracing method. 
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2. In every case, without exception, retention was greater by both 
measures of retention when the manual tracing method was used. 

II. Figures 1, 2, and 3 show the average results for the group as a 
whole. | 

1. There is practically no difference in terms of trials to learn 
between the two methods. 
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2. Retention was significantly greater when the manual tracing 
method was used, since the difference between the two means was over 
six times the probable error of the difference of the two means in terms 
of “‘first recall,’”’ and over seven times in terms of the “‘savings scores.” 





DISCUSSION 


Although it has been demonstrated that teaching reading with the 
added tracing factor is superior to the conventional “‘sight’’ method, 
for our subnormal subjects, in terms of retention, further research must 
be conducted to determine whether this conclusion is applicable to all 
subnormal children, to normal children, or whether the subnormal 
and the normal differ in these respects. | 

The writer cannot propose an adequate theory to explain why 
retention is greater when the child traces a word except to suggest that 
there may be a correlation between somesthetic activity and the recall 
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of verbal symbols. The writer has observed, while teaching reading to 
certain reading disability cases, that they were able to reproduce, 
for example, the sound “sh” after they had traced the word and not 
before. It appeared that the muscular and kinaesthetic element aided 
in verbally reproducing the sound “‘sh.”’ Of course the subjects in 
this experiment recalled before tracing, but we do not know whether 
there were any implicit somesthetic movements involved which may 
have aided recall. We noticed, however, that there was a more sig- 
nificant difference for the savings score, which involved relearning by 
tracing the word, than for the “first recall.” 

The fact that the average number of trials to learn was equal for 
both methods cannot adequately be explained. The writer suggests 
the possibility that certain children may benefit by tracing a word, 
both in terms of learning trials and retention, while others may be 
distracted because of tracing and consequently require more trials to 
learn. In the latter case tracing serves as an obstacle to learning, and 
because of greater effort involved in learning (in terms of a greater 
number of trials required) the retention score was higher for these 
cases. 

No implications are made that this assumption is true. However, 
further research must be directed to determine wherein tracing is 
beneficial in terms of both learning and retention, wherein it is bene- 
ficial only in terms of retention, and also in what cases it is totally 
detrimental. In discussing reversal tendencies in the case of non- 
readers, Dearborn! states that the Fernald tracing method has proved 
adequate for certain types, “but when the sinistral tendencies of hand 
and eye are strong, the Fernald method may aggravate rather than 
ameliorate the condition unless it is carefully managed.” 

Whether left eyedness, left handedness, change of handedness, or 
certain visual defects such as heterophoria have any relation to learning 
and retention as taught by the “sight’”’ method or the tracing method is 
still a question. Incidentally subject Br. in this experiment was both 
left handed and left eyed, and benefited by tracing both in terms of 
learning trials and by both measures of retention. None of the other 
subjects exhibited any sinistral tendencies. 





1 Dearborn, F. W.: Teaching Reading to Non-readers. Elementary School 
Journal, Dec., 1929. 
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THE SAMPLING DISTRIBUTION OF THE TETRAD 
EQUATION? 


HENRY E. GARRETT 
Columbia University 


l. THE PROBLEM 


The purpose of this study was to investigate experimentally the 
sampling distribution of the tetrad equation. The problem may be 
stated concretely as follows: (1) If the same tests are administered to 
successive samples drawn from a given population, and the inter- 
correlations calculated separately for each sample, will the tetrad 
differences computed from these r’s fall into a normal distribution; 
(2) Do present formulas for the PE of a tetrad difference adequately 
describe the sampling distribution? 

The value of the tetrad equation as a criterion of the presence of a 
single general factor running through a group of tests, and the impor- 
tance of this method in Spearman’s theory of Two Factors have been 
often discussed,* and need not be repeated here. For purposes of 
orientation, however, a brief description of the tetrad method will 
be given. 


2. THE TETRAD EQUATION 


From any four tests, 21, 22, 73, and 24, it is possible to compute six 
intercorrelations, viz., T12) T18) 714) 723) 124; and T34) the subscripts in each 
case denoting the tests correlated. By taking these correlations in 
groups of four we can construct six equations as follows: 


tiosa = Ti2%s4 — TisT24 tisea = TisT24 — 112734 
ties = Tie%sa — T1423 tigos = TisT23 — 112734 
tisag = TisT24 — T1472 tiagse = TisTos — TisTe4 


Each of these expressions is a tetrad equation, the value on the 
right side being a tetrad difference, or TD. Only the first two TD’s 
are independent. The third is the difference between equations two 
and one, while the other three TD’s are simply the first three with 
signs changed. Spearman® has shown that whenever four tests yield 
TD’s which are equal to zero, the intercorrelations among these tests 
may be thought of as having arisen from a single general factor running 
through all of the tests. When the tetrad equations are greater than 
zero, additional bonds or ‘“‘group”’ factors must be postulated in addi- 


1 This study was made possible by a grant from the Unitary Tracts Committee. 
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tion to the general factor. Or the general factor may be entirely 
absent, only group factors being present. 


3. THE PE OF A TETRAD DIFFERENCE 


Even when four tests are known to possess only one common factor, 
the TD’s calculated from the intercorrelations of these tests will vary 
somewhat from zero, owing to differences in sampling, and errors of 
measurement. Hence the significance of any given TD—its value 
as an indicator of the presence or absence of a general factor—will 
depend directly upon the size of its PE. Ifa TD is not reliably greater 
than zero (in absolute value), we may assume that only a single general 
factor is present. On the other hand, if a TD, when evaluated in 
terms of its PE, is significantly different from zero, a general factor 
may or may not be present. Several formulas have been derived for 
the PE of a TD. Spearman and Holzinger’ in 1924 published a 
formula for the PE of a TD when the “true” value of the tetrad is 
zero. Moul and Pearson® and Kelley’ have published more general 
formulas from which the PE of any calculated TD may be found. 
Wishart® has also derived a formula for the PE of a TD. All of these 
formulas assume in their practical application a normal distribution 
of TD’s in successive samples. 


4. PRESENT EXPERIMENT 


The experiment described in this paper is concerned with deter- 
mining empirically the distribution of a tetrad difference in successive 
samples when the “true” value of the TD is known. Each observed 
TD distribution is compared with the theoretical normal distribution 
called for by the PE formula. The procedure in detail was as follows. 
Four ‘‘tests’’ were made up as shown in Table I, each test containing 
@ common general factor and from one to three specific factors. The 
weights assigned the general factor are entirely arbitrary, and were 
selected in order to give a fairly wide range of r values. The “‘true”’ 
correlation coefficients for these four artificial tests are given in Table I. 
All of the tetrad equations deriving from the six intercorrelations are, 
of course, equal to zero. 

Allowing the factor weights to remain constant for each test, we 
may determine the efficiency of each factor in contributing to the score 
of a hypothetical subject by throwing dice, drawing cards at random 
from a pack, or by some similar device. Scores on the four tests for 
a given subject are obtained by multiplying each factor weight in the 
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test by the value (subject’s efficiency) on the die or card assigned to 
it. Experimental correlation coefficients based upon samples of vary- 
ing size may then be calculated. A method of this sort, in which 
scores were obtained from dice throws, has been employed by Thomson! 


Tasie I.—Four Tests, Eacn or Wuicn Contains A GENERAL Factor, 
(DIFFERENTLY WEIGHTED) Pius Speciric Factors 






































ON re eee re 1 2 3 4 5 6 7 8 9 
Gs cada hie dnkeaind 5 1 1 
, RRR ie Sia GR 8 4 2 2 1 
TER 6 6.0'b c's Fae os 64 2 3 4 
We ie o's See wb bee e ows 2 4 3 
“True” r’s 
Ti: = .7698 Ta = .4800 
T13 = .5773 Tou = .2971 
Ty. = .3574 Ts = .2228 





and by Hull. On what seems to be valid grounds, however, this 
method has been criticized by Spearman® and hence was discarded in 
the present study. The six faces upon a die are equally probable, 
and hence a given individual has equal chances of obtaining high, low, 
or average efficiency values for a given test factor. This assumes the 
distribution of abilities to be rectangular rather than normal; a result 
which runs counter to nearly all established experimental studies. 
Ordinarily, the occurrence of medium or average scores in a given test 
is far more probable than the occurrence of high or low scores. Instead 
of dice values, therefore, we have used abscissae values (in terms of 
sigma) drawn at random from a normal universe. With such values, 
of course, the probability of an individual receiving medium factor 
weight values is much greater than the probability of his receiving 
high or low factor weight values. 

Our work was considerably expedited by the use of a table, con- 
taining four thousand drawings from a normal universe, taken from 
Shewhart.* Shewhart’s table was constructed as follows. A total of 
nine hundred ninety-eight circular ‘‘chips”’ were given markings of 0, 
+.1, —.1, +.2, —.2, and so on, the markings increasing by increments 
of +.1 or —.1 up to +3.0 and —3.0. The number of chips receiving 
each mark was arranged so that the distribution of values around 
zero was normal. Thus, forty chips were marked 0; thirty-five were 
marked +.5; and thirty-five —.5 and so on, only one chip being 
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marked +3 and one chip —3. A total of four thousand drawings of 
one chip each were made with replacements, 7.¢., after a chip was 
drawn from the bowl it was replaced and mixed thoroughly with the 
others before another chip was drawn. The entries in Shewhart’s 
table, therefore, represent random drawings from a set of values, the 
frequency distribution of which follows the normal probability curve. 

For the purposes of the present study two changes were made in 
the values entered in Shewhart’s table. First, the 0 point was trans- 
ferred to —3 SD, so that there were no negative values; and secondly, 
each value was multiplied by 10 to eliminate decimals. A total of 


TaBLE II.—AveraGs “‘r’”’ VALUES FROM Two HUNDRED SAMPLES OF TWENTY-FIVE 
Cases Eacu 
Compared with ‘‘True” Values 








PE Ps PE 
Correlation | ‘‘True” | Calculated a a (calculated r) 
F (“true” r) ., . | (calculated r) 
coefficients; value value from distri- 
by formula buti by formula 
ution 

Ti2 . 7698 . 7587 .0550 .0653 .0573 

Tis .5773 . 5672 .0899 .0963 .0915 

Tu . 3574 . 3630 .1177 .1108 .1171 

Tes .4800 .4758 . 1038 . 1070 .1044 

24 . 2971 . 3048 . 1230 . 1259 .1224 

34 . 2228 . 2468 . 1282 .1251 . 1267 




















TaB.eE III.—Averacse TD VaLvusEs From Two HunpDRED SAMPLES OF TWENTY-FIVE 
Cases Eacu CoMPARED WITH THE “TRUE” TD’s 
Data on Normality of Distribution 











PE Test for normality 
TD TD PE (calcu- 
Tetrad “a a (“true’’ | lated TD) 
: true calculated 
differences TD) by from PE 
> — formula distri- Bi B: | /B, /B. om 
bution Bi 7 
tisse .0000 .0131 .0890 .0906 .0106/3.5292) .1030) .1168) .2337 
tizas .0000 .0117 .0823 .0826 .0067|3.0018;} .0819| .1168) .2337 
tise: .0000 .0022 .0525 .0517 .0287|3.3259; .1694) .1168) .2337 
































five thousand drawings was then made from Shewhart’s table and 
“scores”’ in each of the four tests described in Table I computed for a 
sample of five thousand subjects. These five thousand scores were 
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assembled into two hundred groups of twenty-five each, in order that 
sampling effects from group to group might be studied. 


5. RESULTS 


The range of values for each of the two hundred calculated correla- 
tion coefficients is given in Fig. 1; and the average r’s with their stand- 
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Fig. 1.—Frequency Distributions for Six r’s Calculated from 200 Groups of 25 Cases 
Each. 
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ard deviations are shown in Table II. The probable errors of the r’s 
as calculated by the usual formula (for N = 25) show close agreement 
with the PE’s computed around the average r for each distribution. 
The distributions of correlation coefficients were not tested for normal- 
ity; but from inspection it is evident that the distributions of low and 
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intermediate r’s are more nearly normal than the distribution of the 
high r.? 

Three tetrads were calculated for each of the two hundred samples 
of the twenty-five cases, viz., tiesa, ticas, and tisae. The distributions 
of these TD’s are given in Fig. 2; and in Table III are presented the 
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Fig. 2.—Frequency Distributions of 3 TD’s Calculated from 200 Groups of 25 Cases 
Each. 


relevant data upon these distributions, together with the tests for 
normality. Several interesting facts stand out in Table III. In the 
first place, it is clear that the average observed TD’s do not deviate 
significantly from their theoretical “true’”’ values of .0000. The PE’s 
of the true TD’s, as calculated from a sample of twenty-five by the 
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Spearman-Holzinger formula, are also very close to the PE’s around 
the mean of the distribution of observed TD values. In no case is 
the difference between theoretical and observed PE’s significant. 
From §; and #2 it appears that the first and third curves (t1234 and t1342) 
are somewhat leptokurtic, but no one of the three curves deviates 
significantly from the normal form. It seems clear, therefore, that 
when only one general factor and specific factors are present, the 
sampling distribution of a tetrad difference may be taken to be normal. 
Under the prescribed conditions, the Spearman-Holzinger PE formula 
for the sampling distribution of a tetrad offers an adequate description. 
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THE EFFECT OF THE INTERVAL BETWEEN TEST 
AND RETEST ON THE CONSTANCY OF THE IQ 


ROBERT L. THORNDIKE 


Columbia University 


There are reported, in the psychological literature, a number of 
experiments in which a group of individuals, generally children, were 
tested with the Binet intelligence test and then retested after an 
interval. Commonly, coefficients of correlation have been computed 
between the I1Q’s on test and retest. These correlations vary widely, 
as do also the intervals between test and retest. The purpose of this 
paper is to bring together the results of these various experimenters 
and, by the method of least squares, determine how the coefficient of 
correlation varies with the interval between test and retest. 

Rather than fit a curve to the obtained values of r, whose sampling 
distribution is badly skewed in the high values that we are considering, 
and whose standard error depends upon the value of the true correla- 
tion in the population, we have converted all our r’s to 2’s, as defined 
by R. A. Fisher, and fitted our curves to the 2’s. The conversion 
equation is 


z = W6flog (1 +r) — log (1 — r)}. (1) 


z has the advantages (1) that its sampling distribution approaches the 
normal for the size samples that we have to consider and (2) that 
its standard error is independent of the value of the true correlation 
in the population. When N is the number of individuals in the sample, 
the standard error of z is 


= (2) 


In fitting our curves by least squares, it was necessary to assume 
that the time interval between test and retest was known accurately. 
This was not the case. In some samples the interval was the same for 
all individuals and was known with a high degree of accuracy, while in 
others it varied for different individuals over a rather wide range. The 
original experimenter would say that “‘the children were tested and 
then retested after an interval of from twelve to twenty-four months.” 
It would perhaps have been desirable to fit a curve by making the sum 
of the perpendicular distances from the points to the line (after both the 
time and z had been divided by their variances) a minimum. But as 
the variance in each direction was different for each point, it was not 
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possible to apply this method with the knowledge at hand. The only 
alternative seemed to be to assume that the time interval between 
test and retest was definitely known, not using those results in which 
the range of times included in a given correlation was too great. 








Experimenter N t, months r 2 
Cuneo and Terman........... 25 0 .95 | 1.832 
its Cheah dean on & 640 30 0 .95 | 1.832 
TR RE EE ee SS SS i i uN 221 0-12 91 1.528 
Cuneo and Terman........... 21 5-7 .942 | 1.755 
Rs iis a ads. ww 103 0-18 .798 | 1.093 
EI Ag iat eee 69 | 73% or 11 (Mn. 10.25) .82 1.153 
ES th i yn adh be bos tu awn 351 6-18 (Mn. 11) .74 . 950 
ST SRST Se Aare Pepe nm. 173 12 .901 | 1.475 
IR os cow weta dino b 00's 298 12 .88 | 1.376 
Garrison and Robinson........ 131 12 .88 | 1.376 
Garrison and Robinson........ 131 12 .92 | 1.589 
Gray and Marsden............ 100 12 .883 | 1.389 
Gray and Marsden............ 42 12 .834 | 1.201 
Rugg and Colloton............ 137 10-16 .84 | 1.221 
SN . vecahawawd phases bod 149 14 (Av.) .87 | 1.333 
SNS ii leis si WA shaihes wow oe 320 12-24 .87 | 1.333 
Cuneo and Terman........... 31 20-24 .852 | 1.263 
Se aisk iveeeseos tenet tac’ 273 19-30 (Mn. 23) .67 811 
PRS és cate tacetteesaebe 139 24 .817 | 1.147 
sos cites weds oda sos 127 24 .91 | 1.528 
Garrison and Robinson........ 131 24 .91 1.528 
Gray and Marsden............ 42 24 .839 | 1.218 
ARE CE eS SRA be oe 37 19-30 .699 . 866 
REE SISCO rae apt me aa 149 29 (Av.) .70 . 867 
REET ERS GER apN mee 99 24-36 .88 | 1.376 
AERIS ES FECA Diora ers mi tong 44 30.7 (Av.) .84 1.221 
Ld s'ks ceRE ewes cbse en 82 31-48 (Av. 35) .56 .633 
is Niclas Seal 6 Ue 105 36 .797 | 1.091 
Gray and Marsden............ 42 36 .843 | 1.231 
ERASE RRR AS 6 31-42 .793 | 1.079 
CL Caen se ws he REE 34 41 .85 | 1.256 
MS Dip. ayaa aces coke be wha 41 36-48 .87 | 1.333 
eds 8. ey. SEA eee 71 48 .786 | 1.062 
Ri cinee dn gdigewadwas ee 43 48 .83 | 1.188 
REESE SR Re Saree er me 6 43-66 .801 | 1.101 
ee Ca ale 37 60 .812 | 1.133 

















We finally included the results of thirteen experimenters in our 
work, and fitted curves to thirty-six correlations which they give. 
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We list below the data to which our curves were fitted. Column I 
gives the name of the experimenter; Column II lists the number in 
each sample; Column III gives the available information about the 
interval between test and retest; Column IV gives the value of r 
obtained; Column V gives the value of z. 

In the remainder of this paper, ¢ will be understood to mean the 
interval between test and retest in months. w signifies the weight 
applied to a given point. Each point was weighted by the reciprocal 
of its variance, that is, by (N — 3). 

We first fitted a straight line to the data. The equation is of he 
form z = A + Bt, where A and B are determined from the equations 


Axtw + Brwt = Lwz 
Axtwt + Brwt? = Twiz (3) 


Substituting numerical values, we get 


3732A + 71,080B = 4,624.76 
71,080A + 1,795,298B = 84,038.26 


A = 1.415, B = —.00916 (4) 
The best-fitting straight line is 
z = 1.415 — .00916i (5) 


The theoretical variance from the trend line of a point of unit weight 
is unity. The observed variance from the trend line of a point of unit 
weight is given by the equation 


Bn i — 3) (6) 


n—2 
where v is the difference between the observed z and the z determined 
by the equation and n is the number of points from which the trend 
line was determined. sg? is found to be 6.1845. The agreement 
between the observed and theoretical variances may be tested by the 
fact that 





3? = 





(9 SF = 7) 
with (n — 2) degrees of freedom. As n — 2 = 34, we must test the 
agreement by making use of the fact that +/2x? — +/2(n — 2) — lis 
normally distributed with mean at zero and standard error of unity 
for samples of this size. In this case, \/2x? — +/2(n — 2) — 1 comes 
out to be 12.33. This value could practically never arise by chance, so 
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we may feel sure that our observed variance is significantly greater 
than is to be expected from theoretical considerations. 

This great excess of observed over theoretical variance may be 
open to a variety of explanations. In the first place, a straight line 
may not adequately fit the data at hand. That this is not one of the 
more important causes of the excess variance is shown by the fact that 
(as we shall see) a second degree curve does not reduce the variance of 
the residuals greatly. We believe the chief causes of the unduly large 
variance to be (1) variation in the adequacy of testing and retesting 
from experimenter to experimenter and (2) different ranges of ability 
among the different groups examined. 

Knowing the variance of a point of unit weight, it is possible to 
compute the variances and standard errors of the coefficients A and B 
in equation (5). Consider the determinant 


Zw Zw 
Zwt wi? 


Let us call this determinant A. It can readily be shown that 


o a = ¢ ou 
These variances are correlated with one iii so the ordinary tests 
of significance do not hold, but a comparison of the values of the coef- 
ficients with their standard errors gives us some information about the 
importance to be attached to the coefficients. The standard errors of 
the coefficients as found to be 


= .082 og = .0015 


Inasmuch as the coefficient of the linear term is six times its standard 
error, we may feel reasonably sure that there is a real drop in the value 
of z as ¢ increases. 


We then fitted a second degree curve to the data by the equations 
Axtw + Brwt + Crwt? = wz 


Axtwt + Brwt? + Crwi? = Lwiz 
Axwi? + Brwt? + Crwt* = Twiz? (9) 








eed oa (8) 


These become 


3,732A +  71,080B+  1,795,298C = 4,624.76 
71,080A + 1,795,298B + 56,796,141C = 84,038.26 
1,795,298A + 56,796,141B + 2,125,501,513C = 2,060,510.93 
A=1616, B=-.0301, C= .000409 (10) 
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The equation is 





z = 1.616 — .0301¢ + .0004092? (11) 
The variance of a point of unit weight is now given by the formula 
v;7(N; - 3) 
Pe >» (12) 
n—3 


and is found to be 6.0497. This is such a slight reduction from the 
variance from the straight line that it seems doubtful whether the 
second degree equation is much of an improvement over the straight 
line. When we compute the standard errors of the coefficients A, B, 
and C we get 


o, = .148, op, = .0132, oe = .000248 


Again the values of the standard errors are correlated with one another, 
so we do not know exactly what they signify, but the fact that the 
quadratic term is only 1.65 times its standard error suggests that this 
quadratic term is not of very great importance. 

It is possible to convert either the linear or the quadratic equation 
back into an equation in r, to show how r decreases with an increase in ft. 
We convert the linear equation as follows: 


z = M4 {log (1 + r) — log (1 + 7)}] 
ht ni eee 
l-—r 
_ &=—} 
~ et +] 


e2-830—.01832¢ _ ] 

r= ¢2-880—.018320 +1 
16.96e—-91882! — 

” = 16.96e—-018s20 +1 


The values of r for different values of ¢ can be shown in tabular 
form as given below. 
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In Chart I we show graphically the scatter of points to which the 
curves were fitted (each point is marked with its own weight), and the 
linear and quadratic curves which were determined as fitting the points. 

In conclusion, we can say that the correlation between Binet test 
and retest falls off as the interval between tests is increased. As far 
as we have been able to determine, a linear equation expresses the 
relationship between ¢ and z adequately. Our least squares solution for 
this line is 


z = 1.415 — .00916¢. 
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THE UTILIZATION OF DATA FROM SIMPLE OR 
DIRECT PREDICTION IN THE DEVELOPMENT 
OF REGRESSION EQUATIONS FOR 
DIFFERENTIAL PREDICTION 


J. MURRAY LEE 
City Schools, Burbank, Calif. 
AND 
DAVID SEGEL 


Office of Education, Washington, D. C. 


The development of a method of differential prediction between 
two criteria has been described recently. By this method differences 
between scores on various or marks in various subjects may be pre- 
dicted. In the development of this method correlation coefficients 
were calculated directly between the predictive items and the 
differences. 

It is the purpose in this paper to point out a simple method for the 
development of differential prediction equations. This method uses 
the zerc order correlation coefficients between the predictive items and 
each of the measures between which a difference is to be predicted. 
Let a and b be the scores made in two different fields of achievement 
and z, y, z, . . . , be the predictive items. The differential predic- 
tion procedure requires the correlations r(c_s)z, Tay, Tia—s)2, --- A 
method has been developed for obtaining such correlations from a 
knowledge of the correlations such as raz, Toz, Tay, Toy, - - - , and of 
other data. This method eliminates the need of finding the difference 
for each item and in a study using several variables will eliminate the 
necessity of calculating a number of correlations. 

The development of a formula for r(c_s)2 by using raz, 752, and other 
data is as follows: 

x(a — b)zx 
N T(a—b)F%z 





Tia) z2 = 


but 


Z(a—b)x Zar LZbhr _ 
N = N ee “N- = TaxO cz — Tx z 


1Segel, David: Differential Prediction of Ability as Represented by College 
Subject Groups. Journal of Educational Research, Vol. XXV, Nos. 1 & 2 (Jan- 
uary and February, 1932), pp. 14-26, 93-98. 
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and since 
Cas) = Voa? + oy — 2rawadr 
Therefore 
r —T 
Ta): = es b27W = 
Tz Oa" + 04? — 2rawads 
or 


Taxa —~ Tbz9d 
at ta ona (Formula No. 1) 


In case sigma scores are used 1.e., 























Z.= Xo — Ma, Z = X, — My, etc. 
Ca Vt) 
Z _ 2(Z. — ZZ; 
(2q— %) #2 ~ No(s,—2)%:, 
but since 
2=(Z. — Zs)Z; s 2(Z.Z2 — ZZz) wie Sect 
N N az bz 
and 
T(s.—2,) = V 2 — 2r ab, and Os. = 1 
then 
T(2,.—2,) "hae a (Formula No. 2) 
ii ab 


The use of formula No. 1 is illustrated by the table on page 552. 

The correlations and standard deviations necessary for the com- 
putation in formula No. 1 are: ra = .655, raz = .310, m2 = —.182, 
o, = 1.58 ando, = 1.69. Substituting these values in formula No. 1 


TazFa ~— TboxOd 














‘parent Vea? + on? — 2ravoads 
it becomes 
are (.310)(1.58) — (—.182)(1.69) 
(os “\/(.58)? + (1.69)? — 2(.655) (1.58) (1.69) 
or 


Tia rd) zc = .59. 


When ric_s)z is calculated directly from the data, that is, when the 
actual differences (a — b) and the predictive item (x) are used in the 
calculation of the correlation coefficient, the result is also equal to .59. 
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Predictive item z Score A Score B Differences between 
scores (a — b) 
15 10 8 2 
13 10 9 1 
15 9 8 1 
17 9 8 1 
10 8 8 0 
7 8 9 -1 
13 8 10 —2 
10 7 9 2 
7 7 9 —2 
8 7 7 0 
6 7 7 0 
12 6 5 1 
15 6 4 2 
14 6 6 0 
12 6 7 1 
12 6 7 ae | 
9 5 8 -—3 
7 5 6 —1 
10 5 oe 0 
13 5 4 1 














This finding corroborates the fact established mathematically that: the 
differential prediction correlation coefficient r(.»). obtained by formula 
No. 1 is a true ria_»)z and not an approximation. The standard devia- 
tions ¢, and o used in formula No. 1 should be in the same units 
as are used in the calculation of the correlation coefficients. 

Let us now consider the calculation of the differential prediction 
equation when rio»), T(a—t)y - - - , are obtained by this new method. 


The general regression equation and the probable error of estimate of 
an individual score are: 


Zé @ hes: . Fee. . ols... 


+ ban.12 ye ese (n—1) Xn + C 
and 


C = Ma — bares . . . nM, — daers . o-® 
M: (wk ban.12 ee (n—1) My, 
and 
074.4 = Dares... no? (1 — r31) + Baas. . 8 


o74(1 — rem) . 2 2 + Danas... aw 
o7,(1 eo Taw)! 


1 Ibid., pages 97 and 98, formula Nos. 1 and 2. 
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In this formula d stands for a difference such as (a — b). The regres- 
sion equation for the prediction of (a — b) from X is: 
dl 


Xa = Tad) z s—(X = M,) + M ¢a-») 


o 





In this equation r(4_»)- is found by formula No. 1. 





F.a—t) = Vee" + o? — 2r a ao, and M (a-» = M,-— My. 


The means and o, may be found while calculating r,. or m,. All 
sigmas used in this regression equation must be expressed in original 
score form and not in grouped form. Substituting the obtained values 
in the regression equation, 








Xe = reye(X — Ms) + Meow 
it follows that 

= _ -50(1.362),.. _ a 

Xa = (3.06) (X — 11.25) + (—.2) 


or 
Xa = .263X — 3.159 


The prediction of a difference from some particular X is made by 
substituting the value of X in the equation and solving for Xq or 
(a — b). Suppose X = 16, then Xz = .263 (16) — 3.159 = 1.049. 
The standard error of estimate of this predicted difference for an 
individual would be expressed by the equation 


2 
C144 = n a o27(1 -_ sx) 


C1i = V [ras 270-2) }?(1 sa Tx) 








or 





where rzx is the reliability of the z variable. Assuming that the 
reliability (r.x) is equal to .85 for purposes of illustration, the standard 
error of estimate of a predicted individual difference score is 


oai = ~/((.59) (1.362) ]?(1 — .85) = .31. 


The probable error of estimate is .674504, or .21. The result of 
predicting the difference (a — b) from ascore of 16 is therefore expressed 
as +1.05 + .21. 
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The prediction of a difference between two criteria by means of 
one predictive item has been developed. By similar methods predic- 
tions of such differences using two or more predictive items may be 
made. Itis only required to work out the necessary zero order correla- 
tion coefficients, ¢.¢. rias)z, Tia—s)y, Tla—d)2) - - - , and use them in the 
regression equation of the proper order. 

There are certain advantages in being able to get r(c_s), from 7.2 and 
Tsz. First, such correlation coefficients as r.. and 7, will usually be 
calculated in a prediction problem due to their use in direct prediction. 
The value in this case is in the economy of time and effort. Second, 
some of the studies of direct prediction already made may be investi- 
gated again for values in differential prediction even though the original 
data are not available. This re-study is made possible because only 
the simple correlations, sigmas, and means are necessary for the 
calculation of r(c_s)z, T(a—t)y, - - - , by the method here developed. 
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BOOK REVIEW 


Francis P. Ropinson. The Réle of Eye Movements in Reading with 
an Evaluation of Techniques for Their Improvement. University 
of Iowa Studies, No. 39, Pp. 52. Iowa City: University of Iowa, 
1933. 


During recent years both research workers and administrative 
officials have come to recognize the marked handicap of reading defi- 
ciencies among college students. Various investigators have shown 
convincingly that reading ability may be improved by an appreciable 
amount providing that adequate motivation is furnished the subject. 
The emphasis on improvement of reading proficiency in “how to 
study’’ courses and the establishment of reading clinics in colleges and 
universities provide practical means of relieving to some degree the 
impediment imposed upon the student by slow reading and poor 
comprehension. 

In this monograph are reported the results of a study to improve 
reading disabilities of students at the University of Iowa Reading 
Clinic. The author’s purpose was to “evaluate the réle of eye-move- 
ment habits in determining reading ability.’’ The analysis of his 
results would seem to indicate that the invalid hypothesis implied in 
this purpose has led the author to conclusions unwarranted by his 
data and unsupported by sound inferences from the findings in other 
investigations. 

There were two experimental and a control group of twenty-one 
students each. All were deficient in reading. One group was trained 
in comprehension by the familiar techniques but with no emphasis on 
speed. Another group was trained to increase speed of reading by the 
ordinary methods, and in addition was given exercises designed to 
promote more regular and less frequent fixations along the lines of 
print. The subjects were strongly motivated. Initial and end meas- 
urements consisting of standardized tests and eye-movement records 
revealed no significant gains by the control group, significant gains in 
comprehension but not in rate of reading by the ‘comprehension 
group,” and significant gains in rate (including eye-movement meas- 
ures) by the “‘rate group.’”’ In the latter group there was also some 
gain in comprehension. There is no question about these facts: (1) 
Training for comprehension improved comprehension but not rate; 
and (2) training for rate improved rate and to a lesser degree improved 
comprehension. 
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The author terms all the training to increase speed of reading eye- 
movement ‘‘pacing,” and states ‘that such training improved eye- 
movement habits, which in turn increased reading efficiency.”’ In all 
his analyses the author takes this position that he is training the eye- 
movements which in turn affect the reading proficiency. 

The monograph is a sound and worthwhile contribution to clinical 
psychology as far as the facts are concerned, but the interpretations 
cannot be accepted as valid. There are abundant data in the field 
demonstrating that (1) eye-movement habits are extremely flexible 
and consequently readily adapt themselves to any change in reading 
proficiency; and (2) any confusion or slowing up of mental processes, 
or any marked change in type of perception during reading is promptly 
reflected in oculomotor performance. Apparently the only justifiable 
inference is that eye-movements are merely symptoms rather than 
causes of reading proficiency. This contention is supported by the 
fact that remedial training with no reference at all to eye movements is 
eminently successful, and is to be preferred since it is more convenient. 
In fact, it appears that measurement of eye-movements may be dis- 
pensed with in the reading clinic without any appreciable loss. Very 
infrequently there is a case whose inaccurate return-sweeps to the 
beginnings of the succeeding lines may be helped by direct attention 
to eye-movements. It appears, therefore, that the author’s inference 
that eye-movements are causes of reading proficiency and that their 
improvement results in increased reading efficiency must be rejected 
as invalid. His term “pacing” the eye movements is unfortunate 
because it may be accepted and perpetuated with its invalid implica- 
tions by the uncritical reader. Mixes A. TINKER. 

University of Minnesota. 
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